Interlaboratory Study Characterizing a Yeast Performance Standard for Benchmarking LC-MS Platform Performance*

Optimal performance of LC-MS/MS platforms is critical to generating high quality proteomics data. Although individual laboratories have developed quality control samples, there is no widely available performance standard of biological complexity (and associated reference data sets) for benchmarking of platform performance for analysis of complex biological proteomes across different laboratories in the community. Individual preparations of the yeast Saccharomyces cerevisiae proteome have been used extensively by laboratories in the proteomics community to characterize LC-MS platform performance. The yeast proteome is uniquely attractive as a performance standard because it is the most extensively characterized complex biological proteome and the only one associated with several large scale studies estimating the abundance of all detectable proteins. In this study, we describe a standard operating protocol for large scale production of the yeast performance standard and offer aliquots to the community through the National Institute of Standards and Technology where the yeast proteome is under development as a certified reference material to meet the long term needs of the community. Using a series of metrics that characterize LC-MS performance, we provide a reference data set demonstrating typical performance of commonly used ion trap instrument platforms in expert laboratories; the results provide a basis for laboratories to benchmark their own performance, to improve upon current methods, and to evaluate new technologies. Additionally, we demonstrate how the yeast reference, spiked with human proteins, can be used to benchmark the power of proteomics platforms for detection of differentially expressed proteins at different levels of concentration in a complex matrix, thereby providing a metric to evaluate and minimize preanalytical and analytical variation in comparative proteomics experiments.


Optimal performance of LC-MS/MS platforms is critical to generating high quality proteomics data. Although individual laboratories have developed quality control samples, there is no widely available performance standard of biological complexity (and associated reference data sets) for benchmarking of platform performance for analysis of complex biological proteomes across different laboratories in the community. Individual preparations of the yeast
Saccharomyces cerevisiae proteome have been used extensively by laboratories in the proteomics community to characterize LC-MS platform performance. The yeast proteome is uniquely attractive as a performance standard because it is the most extensively characterized complex biological proteome and the only one associated with several large scale studies estimating the abundance of all detectable proteins. In this study, we describe a standard operating protocol for large scale production of the yeast performance standard and offer aliquots to the community through the National Institute of Standards and Technology where the yeast proteome is under development as a certified reference material to meet the long term needs of the community. Using a series of metrics that characterize LC-MS performance, we provide a reference data set demonstrating typical performance of commonly used ion trap instrument platforms in expert laboratories; the results provide a basis for laboratories to benchmark their own performance, to improve upon current methods, and to evaluate new technologies. Additionally, we demonstrate how the yeast reference, spiked with human proteins, can be used to benchmark the power of proteomics platforms for detection of differentially expressed proteins at different levels of concentration in a complex matrix, thereby providing a metric to evaluate and minimize preanalytical and analytical variation in comparative proteomics experiments.

Molecular & Cellular Proteomics 9:242-254, 2010.
Access to proteomics performance standards is essential for several reasons. First, to generate the highest quality data possible, proteomics laboratories routinely benchmark and perform quality control (QC) 1 monitoring of the performance of their instrumentation using standards. Second, appropriate standards greatly facilitate the development of improvements in technologies by providing a timeless standard with which to evaluate new protocols or instruments that claim to improve performance. For example, it is common practice for an individual laboratory considering purchase of a new instrument to require the vendor to run "demo" samples so that data from the new instrument can be compared head to head with existing instruments in the laboratory. Third, large scale pro-teomics studies designed to aggregate data across laboratories can be facilitated by the use of a performance standard to measure reproducibility across sites or to compare the performance of different LC-MS configurations or sample processing protocols used between laboratories to facilitate development of optimized standard operating procedures (SOPs).
Most individual laboratories have adopted their own QC standards, which range from mixtures of known synthetic peptides to digests of bovine serum albumin or more complex mixtures of several recombinant proteins (1). However, because each laboratory performs QC monitoring in isolation, it is difficult to compare the performance of LC-MS platforms throughout the community.
Several standards for proteomics are available for request or purchase (2,3). RM8327 is a mixture of three peptides developed as a reference material in collaboration between the National Institute of Standards and Technology (NIST) and the Association of Biomolecular Resource Facilities. Mixtures of 15-48 purified human proteins are also available, such as the HUPO (Human Proteome Organisation) Gold MS Protein Standard (Invitrogen), the Universal Proteomics Standard (UPS1; Sigma), and CRM470 from the European Union Institute for Reference Materials and Measurements. Although defined mixtures of peptides or proteins can address some benchmarking and QC needs, there is an additional need for more complex reference materials to fully represent the challenges of LC-MS data acquisition in complex matrices encountered in biological samples (2,3).
Although it has not been widely distributed as a reference material, the yeast Saccharomyces cerevisiae proteome has been extensively used by the proteomics community to characterize the capabilities of a variety of LC-MS-based approaches (4 -15). Yeast provides a uniquely attractive complex performance standard for several reasons. Yeast encodes a complex proteome consisting of ϳ4,500 proteins expressed during normal growth conditions (7, 16 -18). The concentration range of yeast proteins is sufficient to challenge the dynamic range of conventional mass spectrometers; the abundance of proteins ranges from fewer than 50 to more than 10 6 molecules per cell (4,15,16). Additionally, it is the most extensively characterized complex biological proteome and the only one associated with several large scale studies estimating the abundance of all detectable proteins (5,9,16,17,19,20) as well as LC-MS/MS data sets showing good correlation between LC-MS/MS detection efficiency and the protein abundance estimates (4,11,12,15). Finally, it is inexpensive and easy to produce large quantities of yeast protein extract for distribution.
In this study, we describe large scale production of a yeast S. cerevisiae performance standard, which we offer to the community through NIST. Through a series of interlaboratory studies, we created a reference data set characterizing the yeast performance standard and defining reasonable per-formance of ion trap-based LC-MS platforms in expert laboratories using a series of performance metrics. This publicly available data set provides a basis for additional laboratories using the yeast standard to benchmark their own performance as well as to improve upon the current status by evolving protocols, improving instrumentation, or developing new technologies. Finally, we demonstrate how the yeast performance standard, spiked with human proteins, can be used to benchmark the power of proteomics platforms for detection of differentially expressed proteins at different levels of concentration in a complex matrix.

EXPERIMENTAL PROCEDURES
Generation of Yeast Protein Performance Standard-An SOP for preparation of the yeast performance standard was developed based on the approach of Piening et al. (12) with modifications to allow for scale-up. Production was outsourced to Boston Biochem (Cambridge, MA). The full protocol is given in supplemental Section A; initial characterization of the preparation is presented in supplemental Section B. In brief, S. cerevisiae strain BY4741 (MATa, leu2⌬0, met15⌬0, ura3⌬0, his3⌬1) was grown in a 10-liter batch of rich (yeast extract peptone dextrose) medium at 30°C in a fermentor to an A 600 of 0.93. The yeast were harvested by continuous flow centrifugation (yield, 5.4 g wet weight), and the cell pellet was washed three times with ice-cold water. The cells were lysed by incubation with ice-cold trichloroacetic acid (10% final concentration in 160-ml total volume) for 1 h at 4°C. The protein precipitate was collected by centrifugation, washed twice with 160 ml of cold 90% acetone, and pelleted again. The resulting material was lyophilized and stored at Ϫ80°C. The total yield of lyophilized yeast lysate was ϳ0.75 g.
Preparation of Digested Yeast Lysate for Study Sample Preparation-Lyophilized yeast lysate (ϳ11 mg) was reconstituted in 50 mM ammonium bicarbonate containing 2 mg/ml RapiGest SF (Waters), heated at 60°C for 45 min, and sonicated for 5 min on ice. Next, 50 mM DTT in 50 mM ammonium bicarbonate was added to yield a final DTT concentration of 5 mM, and the sample was incubated at 60°C for 30 min. After cooling to room temperature, 200 mM iodoacetamide in water was added to yield a final concentration of 10 mM, and the alkylation reaction was left to proceed at room temperature in the dark for 30 min. To quench alkylation, 100 mM DTT in 50 mM ammonium bicarbonate was added to the sample to yield a final concentration of 10 mM. Prior to the addition of trypsin, an additional volume of 50 mM ammonium bicarbonate was added to the sample to reduce the RapiGest concentration to 0.1%. Trypsin (0.5 g/l in 20 mM aqueous HCl) was then added to the yeast lysate sample in a 1:50 ratio to the total protein amount. The sample was digested overnight (18 h) at 37°C with gentle swirling. After digestion, to inactivate trypsin and cleave the RapiGest, concentrated trifluoroacetic acid was added to the sample to yield a concentration of 0.5%. The sample was then incubated again at 37°C for 60 min followed by centrifugation at 10,000 rpm for 10 min. The supernatant was transferred to a new sample tube and lyophilized to dryness; after lyophilization, the dried digest was resuspended in 0.1% aqueous formic acid to yield a concentration that would correspond to ϳ60 ng/l total yeast protein prior to digestion. Where indicated, 48 human proteins (Sigma UPS1) were spiked into the reconstituted yeast performance standard (supplemental Section C).
LC-MS/MS Methods-Each laboratory was asked to follow an SOP for collection of all data in Study 6. A detailed description of the SOP is provided in supplemental Section C. Parameters and settings specified in the SOPs were derived by a combination of consensus among the participants and limited method optimization studies. The SOPs do not represent fully optimized methods and are not intended to be prescriptive for the field. The SOPs were used instead to minimize variation due to factors that could be anticipated and controlled. Each laboratory was allowed to use its own favorite protocol for Study 8, and the individual protocols are summarized in supplemental Section D. Four models of mass spectrometer were used: LTQ, LTQ-XL, LTQ-XL-Orbitrap, and LTQ-Orbitrap (see supplemental section J). In each case, MS/MS spectra were collected in the LTQ. For the LTQ-Orbitrap instruments, MS1 spectra used to determine the precursors selected for MS/MS were collected at 60,000 resolution in the Orbitrap. These high resolution scans enabled precursor selection to be limited to precursors that exhibited both a charge of 2ϩ or higher and an isotope cluster from which the monoisotopic peak could be discerned. The low resolution MS1 scans on LTQ instruments did not enable these precursor selection criteria. A complete description of the acquisition parameters and other instrument configuration parameters can be found in supplemental Sections C, D, and J.
Database Search Pipeline-For Studies 6 and 8, centroided tandem mass spectra were converted to peak lists in mzXML format by the msConvert tool of ProteoWizard 1.6.0 (21). The software was configured to centroid MS scans. Peptides were identified against the S. cerevisiae Genome Database orf_trans_all, downloaded April 6, 2007. These 6,718 sequences were augmented by 48 UPS1 sequences (Sigma) (supplemental Section E), 23 NCI20 sequences (supplemental Section F), and 74 contaminant protein sequences; the full database was then doubled in size by adding the reversed version of each sequence. The FASTA file is available at http://cptac. tranche.proteomecommons.org/. The MyriMatch database search algorithm version 1.6.0 (22) identified tandem mass spectra to peptide sequences. Semitryptic peptide candidates were included as possible matches. The configuration defined proteolytic cleavage sites after any Lys or Arg (whether or not Pro was the next residue) or after a Met at the N terminus of a protein, allowing for up to two missed cleavages. Potential modifications included oxidation of methionines, formation of N-terminal pyroglutamine, deamidation of Asn-Gly motifs, and carbamidomethylation of cysteines, all as variable modifications. For the LTQ, precursors were allowed to be up to 1.25 m/z from the average mass of the peptide. For the Orbitrap, precursor ions were required to fall within 10 ppm of the database peptide with ppm computed from m/z values. To retain identifications in which the peptide monoisotope had been miscalled by the instrument control software, MyriMatch also sought matches in which a neutron had been added to or subtracted from each database peptide. Fragment ions were uniformly required to fall within 0.5 m/z of the monoisotope. IDPicker version 2.5 (23, 24) applied a 2% false identification rate per raw file at the peptide-spectrum match level and applied parsimony to the protein lists, requiring all proteins to match at least two distinct peptide sequences and to match at least 13 spectra (one per instrument per study). The two-peptide rule was applied globally, not by instrument. Hence, proteins on the list might have a single peptide for a given instrument. In contrast, the two-peptide rule was applied per raw file in the statistics code that generated the outputs displayed in Tables I and III as well as Figs. 2 and 3. IDPicker reports can be downloaded from http://cptac.tranche.proteomecommons.org/.
Statistical Methods-For Table I, we used a dimensionless measure, the coefficient of variation, to compare the within and between laboratory variation of the total number of spectra, sequences, and the total number of proteins. The coefficient of variation is the normalized measure of dispersion, computed as the ratio of the standard deviation to the mean of the data. The within lab CV% is computed as the ratio of the standard deviation of the response over all runs (times 100) for each lab to its mean. The between lab CV% is the ratio of the standard deviation of the means of each lab to its overall mean (times 100). For Tables I and II, a summary of the performance metrics is provided in supplemental Section G; a detailed description of these (and additional) metrics can be found in the accompanying study by Rudnick et al. (32). Furthermore, the software pipeline used to calculate the performance metrics is available for download from NIST.
For Table III, we treated the spiked sample as a case sample and the pure yeast reference sample as a control sample. For each sample, we used the data resulting from all 21 independent LC-MS/MS replica runs (nine from LTQ instruments and 12 from Orbitrap instruments). The data were analyzed using the SASPECT (significant analysis of peptide counts) method described elsewhere (25). Briefly, SASPECT uses a probability model to make inferences about protein abundances based on peptide detection, and permutation testing is used to estimate the false discovery rate (FDR) of the analysis.
For Fig. 2, logistic regression was used to evaluate the association between TAP copy number and detection of yeast proteins in Studies 6 and 8. ("TAP copy number" refers to the estimated molecules per cell of a given protein, derived from the data of Ghaemmaghami et al. (16). The logarithm (base 10) of copy number was used as the regressor to improve model fit to the data. Although TAP copy number estimates include measurement error (16), this uncertainty was not reflected in the logistic regression analysis. We report a summary measure of performance for each RPLC run, the CN50. This statistic estimates the copy number corresponding to 50% probability of detection for a randomly selected yeast protein. Smaller CN50 values indicate greater depth of proteome sampling. CN50 is easily derived from the logistic regression coefficients. Mixed effects linear models were used for the subsequent analysis of CN50 values and for yeast peptide and protein counts in Studies 6 and 8. This procedure allows simultaneous estimation of potential effects of protein spikes as well as inter-and intrainstrument variability. (See supplemental Section H for the statistical analysis code for computing the CN50 values as well as a complete table of CN50 values.) Public Access to Data-To manage the large number of data files generated for these studies, a password-protected web site was developed. This site, hosted at NIST, was designed as a portal used by the participating laboratories for initiating uploads and downloads of large data files. Information for each team including labs and instruments was preloaded into the system, stored in a MySQL database. At the beginning of each study, the participating instruments at each lab were added to the study, creating hyperlinks by which data could be uploaded. Importantly, all uploads were then unambiguously tied to the originating instrument and study along with a date stamp.
The data transfers were performed using Tranche, an open source, secure peer-to-peer file sharing tool. A custom user interface for use by participating laboratories was developed and added to the Tranche code base. This custom tool allowed the web site and database to communicate tracking information with Tranche via custom URLs. When uploads finished, the Tranche hash (a unique data identifier) and pass phrase were automatically recorded into the web site's database. These stored links allow for subsequent retrieval of the data files using the Tranche download tool. The Tranche hashes and pass phrases provide a simple and portable way to access data sets, including relatively large data sets, and can be easily associated with supporting annotation. Once published, the data will be made available to the community via Tranche at the data archival page, http://cptac.tranche.proteomecommons.org/.
Availability of Yeast Performance Standards-The certified yeast reference material for proteomics use is under development at the NIST to meet the long term needs of the community and will be available in 2010. In the interim, aliquots of the yeast performance standard described in this study are available through NIST. 2

Production and Characterization of Yeast Performance
Standard-A standard operating procedure for preparation of the yeast performance standard was developed based on the approach of Piening et al. (12) with modifications to allow for scale-up (supplemental Section A). Production was outsourced to a commercial vendor, and the protein preparation was transferred to NIST for initial characterization (supplemental Section B). In total, 0.75 g of protein was obtained from 10 liters of yeast culture. The certified yeast reference material for proteomics use is under development at NIST to meet the long term needs of the community. In the interim, aliquots of the yeast performance standard described in this study are available through NIST.
Identical aliquots of the trypsin-digested performance standard were distributed to multiple participating laboratories to generate a data set characterizing the performance standard and determining the degree of variation in the performance of LC-MS platforms across participating laboratories. Five independent laboratories, seven independent instruments, and four instrument models (LTQ, LTQ-XL, LTQ-XL-Orbitrap, and LTQ-Orbitrap) were included in the study. Two sets of analyses were performed (Fig. 1). For both sets, all sample processing (i.e. trypsin digestion, alkylation, and reduction) was done centrally at NIST, and all data were submitted and analyzed through a single analysis pipeline; hence, any experimental variation between laboratories was due to differences in LC-MS performance. In one set of analyses ("Study 6"), each sample was also run in triplicate (120 ng loaded on column); however, each laboratory was asked to follow a predefined SOP dictating HPLC and MS parameters (supplemental Section C). In addition, a series of samples was generated in which a mixture of 48 human proteins (Sigma UPS1) was spiked into the yeast performance standard at several concentrations, and each of these spiked samples was also analyzed in triplicate using the SOP. In a second set of analyses ("Study 8"), each participating laboratory was asked to perform six shotgun MS/MS runs of the performance standard (three runs of each of 120 and 600 ng loaded on column) with each laboratory using its own favorite LC-MS protocol.
The results for the unspiked yeast performance standard are summarized in Table I. The unspiked samples in Study 6 produced 72,743 identified spectra that translated into 11,822 distinct peptides. (When spiked samples from Study 6 were also included, the search produced 407,836 high confidence identifications accounting for 17,904 distinct peptides.) Study 8 (all samples) produced 191,311 identified spectra for 17,333 distinct peptides. In Study 8, 120 ng on column yielded 76,808 identified spectra related to 11,622 peptides, and for the cases in which 600 ng were loaded, the search rendered 113,892 high confidence identifications accounting for 15,464 distinct peptides.
As expected, the Orbitrap instruments identified significantly more peptides than the LTQ instruments across all three data sets (Table I, parts A-C; p value ϭ 0.013, t test). 2 Aliquots can be requested from NIST (proteome@nist.gov).
FIG. 1. Overview of analyses of yeast performance standard. Samples were processed centrally at NIST, and identical aliquots were distributed to the participating laboratories for LC-MS/MS analyses. Each sample was analyzed in triplicate, and the data were processed and analyzed centrally using a single analysis pipeline. For Study 6, each laboratory conformed to a prespecified SOP (supplemental Section C). For Study 8, no SOP was instituted, and the individual methods of each laboratory are described in supplemental Section D. Four performance metrics (described in supplemental Section G and in Table II), designed to diagnose LC-MS issues, are provided for individual instruments. C-3A is median peak widths for unique peptides; C-2A is retention period over which 50% of the identified peptides eluted; DS-2B is the number of MS2 spectra produced over C-2A; IS-3B is the ratio of the number of 3ϩ/2ϩ charge states for all peptide identifications.
Detection sensitivity could also be monitored using the performance standard. The abundance of ϳ3,800 yeast proteins has been estimated by quantitative Western blotting (16), and previous studies have shown strong correlation between these protein abundance measurements and detection efficiency by LC-MS (4,12). This strong correlation is recapitulated in our analyses of the yeast performance standard (Fig.  2) with the Orbitrap instruments overall showing slightly higher detection efficiencies than the LTQ instruments (Table I, CN50 values; p value ϭ 0.029, t test). Intra and interlaboratory variation was also calculated from the data set. Across the vast majority of parameters (Table I), intralaboratory variation was smaller than interlaboratory variation. For example, for depth of proteome sampling (Table I, CN50 values), intralaboratory variation was significantly lower than interlaboratory variation (p value Յ0.028 by one-sided paired two-sample t test).

Use of Performance Metrics for Diagnosing LC-MS Platform
Problems-In work leading up to these interlaboratory studies, occasional runs were observed wherein the number of high confidence peptide identifications was significantly less than expected based on the bulk of data. In an effort to identify and monitor the factors contributing to this variability, a set of LC-MS performance metrics was developed. These metrics, described in detail in the accompanying study by Rudnick et al. (32), fall into six classes (chromatography,  (16)); the abundance of proteins ranges from fewer than 50 to more than 10 6 molecules per cell, and only 9% of yeast proteins have copy numbers greater than 20,000. Logistic regression curves (in color) indicate the probability of protein detection as a function of copy number for each run. The vertical lines indicate the mean copy number (for each instrument) corresponding to 50% probability of detection (CN50). Smaller CN50 values indicate greater depth of proteome sampling. The graph indicates that, on average, only the most abundant yeast proteins have a high probability of being detected in this one-dimensional LC-MS analysis. a shows the results for high protein loading (600 ng on column) when each lab uses their typical (non-SOP) testing protocol (Study 8). b shows the results for low protein loading (120 ng on column) using the same (non-SOP) protocol. c shows the results obtained from the SOP (Study 6; 120 ng on column). As expected, CN50 is increased in the low protein loading group compared with the high loading group (p Ͻ 0.0001). For equivalent "detectability," a randomly selected protein must be present at nearly 40,200 copies per cell in the low loading group versus 24,650 copies per cell at high loading (95% confidence interval for the difference 11,150 to 20,470). dynamic sampling, ion source, MS1 signal, MS2 signal, and peptide identifications), each designed to monitor the performance of a different aspect of the LC-MS platform (Table  II). These metrics were computed for the yeast proteome reference data set. The 15 metrics showing the highest variation across the yeast reference data set are shown in Table  II, and four of these (C-3A, C-2A, DS-2B, and IS-3B) are reported for individual instruments in Table I.    (32)) was computed (from the historical benchmarking data sets described herein) to demonstrate average values obtained for both the LTQ instruments (part A) and the Orbitrap instruments (part B). Intra-and interlaboratory variation is also shown.
The performance metrics can be used to diagnose the underlying cause when suboptimal data are obtained. For example, early in the course of these experiments, it was noted that data from the LTQ@73 instrument (laboratory identities are coded numerically) consistently produced fewer peptide identifications than the other instruments in the study. Once the performance metrics were calculated, it became apparent that the retention period over which 50% of peptides were identified (performance metric C-2A) on LTQ@73 was significantly shorter than the average for the other LC-MS/MS platforms. In the affected data, this prime zone for peptide identification lasted only 21.93 min for LTQ@73 compared with an average of 31.03 min for the other platforms, indicating a contraction of the chromatography. Upon closer examination of the chromatography system, a degraded pump seal and contaminated check valve were identified despite no indication of performance degradation in measured flow rate or pressure. Following repair of the HPLC system, manual recalibration of the flow rates, and implementation of a lower flow rate for sample loading on the column, the retention time duration of the inner half of peptide identifications increased to 31.3 min (supplemental Section I).
In the transition from Study 6 (ϩSOP) to Study 8 (no SOP), the three instruments that had the most substantial changes to their procedures increased their numbers of matched spectra for the 120-ng sample by 56% (LTQ@73), 68% (Orbitrap@56), and 102% (Orbitrap@86). The other three instruments, all of which made more minor deviations from the SOP, increased or decreased their numbers of matched spectra by Ͻ20%. The number of peptides identified for LTQ@73, Orbitrap@56, and Orbitrap@86 correlated tightly with the median peak chromatographic width (metric C-3A; Table I, parts A and B), indicating the importance of this metric in peptide/ protein identification yield. In both of these cases, smaller dimension packing material was used in these specific laboratories (supplemental Section D), accounting for the differences in peak widths observed. The sensitivity of Orbitrap@56 and Orbitrap@86 was also further improved by reducing the column diameters and flow rates from the Study 6 SOP configuration of 100-m inner diameter at 600 nl/min. In Study 8, both instruments used 75-m-inner diameter columns with flow rates of 200 (Orbitrap@56) and 400 nl/min (Orbitrap@86). Orbitrap@86 also had a further boost in Study 8 from improved ionization. In Table I, part C, for Study 6, the performance metrics flagged an ionization issue. Orbitrap@86 produced far fewer peptide identifications than the other Orbitraps in the study, and this instrument also showed a Ͼ3-fold reduction in the ratio of 3ϩ/2ϩ peptide ions detected (Table I, part C, see IS-3B). The identification of triply charged peptides fell from an average of 30% of doubly charged peptides on comparable instruments to less than 10%, suggesting a problem during ionization.
Yeast Performance Standard Can Be Spiked with Exogenous Proteins to Assess Detection Efficiency-Western blot-ting is semiquantitative (16); hence, although we see good general correlation between protein copy number estimates and LC-MS detection efficiency (Fig. 2), we cannot determine the relationship between absolute protein concentrations and LC-MS detection efficiency using only the yeast proteins. To further quantify the detection sensitivity across the platforms, we spiked 48 human proteins (Sigma UPS1) into the yeast performance standard at a range of concentrations (0, 0.2, 0.74, 2.2, 6.7, and 20 fmol/l) and determined the detection efficiency of these 48 equimolar proteins in the yeast matrix (Fig. 3). As expected, the number of human proteins detected increases with spike concentration (Fig. 3a). As the concentration of the spike proteins increases, the instruments showed a gradual decline in the number of yeast peptides/ proteins identified, potentially due to ion suppression or competition with spiked peptides for MS/MS sequencing time (Fig. 3, b and c).

Yeast Standard, Spiked with Human Proteins, Can Be Used to Benchmark the Power of Proteomics Platforms for Detection of Differentially Expressed Proteins at Different Levels of Concentration in Complex
Matrix-An increasingly common goal of proteomics is to discover proteins that are differentially expressed between two classes of samples. Frequently, biological samples are subjected to fractionation to improve depth of sampling. Even using standardized protocols, each step of sample processing introduces preanalytical variation that can lead to false positives and false negatives in comparative proteomics experiments, especially in label-free approaches where the two classes of samples are processed separately in parallel. Because of the high cost and effort to verify each potentially differentially expressed protein identified (26 -28), it is imperative that the candidate discovery process is designed to minimize the FDR.
We used the Study 6 spiked yeast performance standard data for determining the FDR in a relatively simple scenario using a one-dimensional LC-MS/MS platform and a labelfree approach to detect proteins differentially present in the Study 6 samples. This analysis is presented as a proof of concept for an approach that can also easily be applied to more complex analyses, such as those involving multidimensional separations. As described above, in Study 6, the standard human protein mixture was spiked into the yeast reference at five different levels. For each spike-in level, we compared the spiked sample with the unspiked yeast performance standard. Our goal was to detect the "differentially expressed" proteins between the two classes (spiked versus unspiked); because the matrix (yeast performance standard) was identical among all of the samples, only the spiked-in human proteins were differentially present. Hence, yeast proteins appearing as biomarkers are a measure of the false positive rate. Similarly, human proteins that the experimental work flow fails to identify as biomarkers provide a measure of the false negative rate.
For each sample, we used the data resulting from all 21 independent LC-MS/MS replica runs (nine from LTQ instruments and 12 from Orbitrap instruments). The data were analyzed using the SASPECT method (25), which uses a probability model to make inferences about protein abundances based on peptide detection. Permutation testing is used to estimate the FDR. The results for FDR Յ0.01 are shown in Table III. Not surprisingly, the more abundant the human proteins are, the easier they can be detected as differential. For the lowest spike-in level (0.25 fmol/l), we could not detect any of the human proteins as differential, whereas for the highest spike-in level (20 fmol/l), we are able to correctly identify 40 of 48 spiked-in proteins as differential.

DISCUSSION
The major purpose of this study was to provide a well characterized complex proteome performance standard (and an associated reference data set) for LC-MS benchmarking and to make this material available to the proteomics community for providing a means for comparing performance of LC/MS/MS platforms (i) over time (as a quality control), (ii) after the addition of new technologies (to evaluate their effectiveness compared with current technologies), and (iii) between laboratories (to inform optimization and troubleshooting). Although our studies were focused on commonly used ion trap instruments, the reference sample can be applied to any platform of interest to benchmark performance over time or between laboratories. For example, although our studies focused on CID for the MS/MS Protein detection is defined as observing two or more peptides mapping to the same protein (in a single RPLC run). On the x axis, "Spike concentration" refers to the concentration of the 48 equimolar human proteins (UPS1) spiked into the yeast matrix. Also on the x axis, "Yeast" refers to the unspiked matrix (i.e. 0 fmol/l UPS1). a shows that the number of detected UPS1 proteins increases with increasing spike concentration. Note that at an equimolar spike concentration of 2.2 fmol/l all instruments detect at least one UPS1 protein in each run. b and c show, respectively, the total number of yeast proteins and peptides detected per RPLC run. Different instruments show large variation in both the number of proteins/peptides detected and in the response to increasing spike concentration (p Ͻ 0.0001).
analysis, in view of the recent demonstrations of the complementary value of CID and electron transfer dissociation in proteome analyses (29,30), it may be of interest to use the performance standard to further evaluate these two approaches.
Of note, the yeast performance standard is now the only true biological performance standard available as a well characterized preparation (obtainable from NIST; see "Experimental Procedures"). Individual laboratories using common ion trap instruments can now analyze the performance standard and determine how their platforms perform relative to the reference data summarized in this study, providing a measure of how their LC-MS platforms are performing relative to those in other laboratories in the community. For example, one can determine how many peptides and proteins their LC-MS system identifies from the yeast reference and how deeply (CN50) and reproducibly their platform samples the proteome; these results can be compared with the data described in this study. If the results differ significantly and the new data exceed the performance of the reference data, then a comparison of the methods used should reveal parameters that further optimize LC-MS performance in the community. If the results show subpar performance compared with the reference data set, this will facilitate the correction of potential underlying problems. Application of the performance metrics (Table II and the accompanying study by Rudnick et al. (32)) will facilitate the differential diagnosis of underlying problems as illustrated in this study.
The one-dimensional LC-MS/MS platforms we used are the simplest of shotgun analysis systems and represent the minimal depth of proteome coverage available with current methods. Multidimensional LC-MS/MS would substantially increase the numbers of identifications achieved. Because the focus of the CPTAC interlaboratory studies described here was on the performance of the reverse phase LC-MS/MS platform, we did not attempt to incorporate multidimensional peptide separations. (The combination of different peptide separation steps would have greatly complicated interpretation of sources of system variability.) Nevertheless, the yeast reference is well suited to benchmark more complex work flows that provide more in-depth proteome coverage (e.g. combining an upstream protein or peptide fractionation step with LC-MS/MS). For example, the CN50 value ( Fig. 2 and Table I) associated with a given experimental work flow may serve as a useful performance metric for comparisons between alternative work flows and would also be a useful means of assessing the enhancement of proteome coverage by multidimensional protein and peptide fractionation steps.
As discussed elsewhere (2, 3), there is a need for multiple types of proteomics reference materials. For example, although yeast provides an invaluable complex performance standard, the "shape" (i.e. the abundance range of the constituent proteins) of the yeast proteome is not identical to all biological matrices (e.g. plasma), and optimization of some performance characteristics is likely to be contextspecific. Additionally, the yeast performance standard is not useful for testing the performance of species-specific technologies, such as the immunodepletion columns designed to remove abundant proteins from human plasma (31) (unless human plasma is spiked into the yeast proteome). Hence, it would be of great value to the community to have several well characterized types of reference materials so that an appropriate material could be chosen to match the target application. Of note, because the abundances of the majority of the yeast proteins have been estimated (16), one can also imagine using the yeast performance standard as the spike. For example, if one wanted to benchmark the depth of coverage of their work flow in a plasma matrix, a small amount of the yeast reference could be spiked into the To demonstrate the utility of the reference proteome for benchmarking performance of biomarker discovery strategies, we used the Study 6 spiked yeast reference data to investigate the power of detecting potential biomarkers under typical case-control settings. The data were analyzed using the SASPECT method (25), which uses a probability model to make inferences about protein abundances based on peptide detection. Permutation testing is used to estimate the FDR. plasma, thus providing several thousand proteins whose relative abundances are known and whose detection efficiencies could be determined as a metric of performance. An increasingly common goal of proteomics is to discover proteins that are differentially expressed between two classes of biological samples (e.g. treated versus untreated, case versus control, etc.). In such comparative proteomics experiments, it is desirable to minimize sources of variation to maximize the statistical power for detecting proteins that are differentially present. In such experiments, there are several types of variation, including biological, preanalytical, and analytical variation. Biological variation is inherent to the cell or organism being studied; it is a fact of nature. In contrast, preanalytical and analytical variations are introduced by our experimental protocols and instruments and thus to some extent can be manipulated. Preanalytical variation is introduced during sample collection, handling, and processing upstream of LC-MS, whereas analytical variation is specifically associated with changes in performance over time of the LC-MS system. A performance standard cannot be used to determine biological variation; however, a performance standard (such as yeast) is invaluable for measuring analytical and preanalytical variation to evaluate the effectiveness of interventions designed to minimize this variation. Where unavoidable variation remains, it is of use to characterize and measure variation during the course of an experiment so that it can be accounted for in the data analysis and interpretation as well as in planning experimental designs.
The results shown in Table III are presented as a proof of concept of the benchmarking that can be done using a spiked performance standard (yeast or other) where the proteome as been highly characterized. This analysis could be applied to more complex work flows typically used in comparative proteomics experiments where greater depth of coverage is desired; for example, the spiked samples could be subjected to off-line strong cation exchange chromatography prior to LC-MS, and the effect of this additional processing step on the sensitivity and specificity of detecting differential proteins could be determined and optimized. (