Direct Iterative Protein Profiling (DIPP) - an Innovative Method for Large-scale Protein Detection Applied to Budding Yeast Mitosis*

The budding yeast Saccharomyces cerevisiae is a major model organism for important biological processes such as mitotic growth and meiotic development, it can be a human pathogen, and it is widely used in the food-, and biotechnology industries. Consequently, the genomes of numerous strains have been sequenced and a very large amount of RNA profiling data is available. Moreover, it has recently become possible to quantitatively analyze the entire yeast proteome; however, efficient and cost-effective high-throughput protein profiling remains a challenge. We report here a new approach to direct and label-free large-scale yeast protein identification using a tandem buffer system for protein extraction, two-step protein prefractionation and enzymatic digestion, and detection of peptides by iterative mass spectrometry. Our profiling study of diploid cells undergoing rapid mitotic growth identified 86% of the known proteins and its output was found to be widely concordant with genome-wide mRNA concentrations and DNA variations between yeast strains. This paves the way for comprehensive and straightforward yeast proteome profiling across a wide variety of experimental conditions.

More recently, major technological advances have yielded quantitative information on the yeast proteome in both haploid and diploid cycling cells (15). However, these methods are cumbersome and technically challenging and they rely on cellular uptake of amino acid analogs for protein labeling (stable-isotope labeling by amino acids in cell culture, SILAC 1 (16)), thus hampering efforts to study the dynamic proteome under environmental conditions that alter the ability of cells to absorb and to process nutrients such as stress response and gametogenesis (17,18). This is a critical issue for efforts to complement the rapidly growing body of data on DNA and RNA with reliable information on most, if not all proteins under many conditions and in different strain backgrounds. A promising solution for this experimental challenge is selected reaction monitoring (SRM), a highly sensitive method with a large dynamic range, that has been used to detect 100 proteins in a single run (19).
We report the development of Direct Iterative Protein Profiling (DIPP), an innovative, robust and highly sensitive method for protein profiling. Critically, DIPP does not require the uptake of amino acid analogs making it suitable for the analysis of a wide range of experimental conditions and mutant strains. The procedure includes a tandem buffer system for protein extraction and a simple acrylamide-gel based step for protein prefractionation and cleavage, followed by three consecutive rounds of peptide detection and protein identification using mass spectrometry and algorithms implemented in Mascot and SEQUEST. We have employed DIPP to study duplicate samples from diploid SK1 MATa/␣ cells undergoing rapid mitotic growth and division in rich medium. The vast majority of the proteins predicted in the yeast genome were identified at least once (86%) (20). For many proteins not detected we observed very little or no mRNA expression (21) or we identified strain-specific DNA variations likely deleterious for the proteins (22). Our simplified and versatile method covers the yeast proteome to a level that is comparable to the most sophisticated approach available today (15). DIPP paves the way for future efforts to study the dynamic budding yeast proteome under many experimental conditions in distinct strains.
at 30°C, and a thermostated autosampler kept at 8°C to reduce sample evaporation. Mobile A (99.9% MilliQ water and 0.1% formic acid (v:v)) and B (99.9% acetonitrile and 0.1% formic acid (v:v)) phases for HPLC were delivered by the Ultimate 3000 nanoflow LC system (Dionex, LC Packings). Ten microliters of prepared peptide mixture was loaded on a trapping precolumn (5 mm ϫ 300 m i.d., 300 Å pore size, Pepmap C18, 5 m) for 3 min in 2% buffer B at a flow rate of 25 l/minute. This step was followed by reverse-phase separations at a flow rate of 0.250 l/minute using an analytical column (15 cm ϫ 300 m i.d., 300 Å pore size, Pepmap C18, 5 m, Dionex, LC Packings). We ran a gradient ranging from 2 to 35% buffer B for the first 60 min, 35 to 60% buffer B from minutes 60 -85, and 60 to 90% buffer B from minutes 85-105. Finally, the column was washed with 90% buffer B for 16 min, and with 2% buffer B for 19 min prior to loading of the next sample. The peptides were detected by directly eluting them from the HPLC column into the electrospray ion source of the mass spectrometer. An electrospray ionization voltage of 1.5 kV was applied to the HPLC buffer using the liquid junction provided by the nanoelectrospray ion source and the ion transfer tube temperature was set to 200°C.
The MS instrument was operated in its data-dependent mode by automatically switching between full survey scan MS and consecutive MS/MS acquisition. Survey full scan MS spectra (mass range 400 -2000) were acquired in the OrbiTrap section of the instrument with a resolution of r ϭ 60,000 at m/z 400; ion injection times are calculated for each spectrum to allow for accumulation of 10 6 ions in the OrbiTrap. The seven most intense peptide ions in each survey scan with an intensity above 2000 counts (to avoid triggering fragmentation too early during the peptide elution profile) and a charge state Ն2 were sequentially isolated at a target value of 10,000 and fragmented in the linear ion trap by collision induced dissociation. Normalized collision energy was set to 35% with an activation time of 30 milliseconds. Peaks selected for fragmentation were automatically put on a dynamic exclusion list for 120 s with a mass tolerance of Ϯ10 ppm to avoid selecting the same ion for fragmentation more than once. The following parameters were used: the repeat count was set to 1, the exclusion list size limit was 500, singly charged precursors were rejected, and a maximum injection time wet was set at 500 ms and 300 ms for full MS and MS/MS scan events, respectively. For an optimal duty cycle the fragment ion spectra were recorded in the LTQ mass spectrometer in parallel with the OrbiTrap full scan detection. For OrbiTrap measurements, an external calibration was used before each injection series ensuring an overall error mass accuracy below 5 ppm for the detected peptides. MS data were saved in RAW file format (Thermo Fisher Scientific) using XCalibur 2.0.7 with tune 2.4.
Data Processing, Generation of Exclusion Lists and Identification of Peptides and Proteins-The data analysis was performed with the Proteome Discoverer 1.2 software supported by Mascot (Matrixscience) and SEQUEST database search engines for peptide and protein identification. MS/MS spectra were first compared with all predicted budding yeast proteins (data provided by Saccharomyces Genome Database release 06/01/2010; number of residues: 3020761, number of sequences: 6717) (20). Mass tolerance for MS and MS/MS was set at 10 ppm and 0.5 Dalton, respectively. The enzyme selectivity was set to full trypsin with one miss cleavage allowed. Protein modifications were fixed carbamidomethylation of cysteines, variable oxidation of methionine, variable acetylation of lysine, and variable phosphorylation of serine, threonine and tyrosine. Identified peptides were filtered based on Xcorr values and the Mascot score to obtain a false discovery rate of 1% and a false positive rate of 5%. We employed Proteome Discoverer to generate lists of peptides identified in the first and second run that are excluded in subsequent LC-MS/MS analyses. Prior to the third analysis, peptide exclusion files from the first two runs are combined. Lists of peptides not filtered out are exported as a text file containing uncharged and accurate mass values (at five decimals) and a retention time window of approximately 1 min. The instrument is configured to work with uncharged masses and to automatically calculate the mass of a peptide based on its exact mass and charge state. A mass tolerance of Ϯ10 ppm is used to reject previously identified peptides within the specified retention time window. Using lower values can lead to reselection of masses because in the parallel mode of operation on an LTQ OrbiTrap XL, the parent ion selection for an ion trap MS/MS is based on an OrbiTrap preview scan that is acquired at a lower resolution (RP 15,000) than the final OrbiTrap full scan, and therefore the masses are less accurate. The list of identified proteins is provided in Supplemental Files 4 (YPD1) and 5 (YPD2).
Tiling Array Expression Data-DNA-strand specific whole-genome expression data obtained with tiling arrays and duplicate samples from diploid SK1 cells cultured in rich medium (YPD) were integrated and compared with the mass spectrometry measurements; data processing methods and expression threshold level parameters were as published (21). For each gene listed in the reference genome, we selected the segments derived from Sc_tiling experiments overlapping by at least 50 bp. When a gene was overlapping with several segments, it was considered as expressed if at least one of the segments was expressed above threshold.
Protein Abundance Data-The relative abundance data of proteins expressed in log-phase growth were extracted from the quantitative Western blot analysis of tandem affinity purification-tagged strains available via Saccharomyces Genome Database (SGD) (http://yeastgfp. yeastgenome.org/) (24).
DNA Variations Between Yeast Strains S288c and SK1-The variations between the haploid reference S. cerevisiae strain S288c and haploid SK1 were obtained from the Yeast Population Genomics project (21). For each gene, we extracted the reference sequence and the corresponding sequence in SK1. Both sequences were translated (synonymous mutations were thus ignored) and sequences were aligned using a classical Needleman and Wunsch algorithm. We then identified deletions, single nucleotide polymorphisms that create stop codons, and nonsynonymous variations in the SK1 genome. To distinguish between nonsynonymous variations occurring in conserved or nonconserved positions we used the fungal alignment provided by SGD (19,22). Proteins lacking homologs across yeast species and proteins for which the reference sequence has changed in SGD since the study by Liti et al. (22) was published were excluded. A bilateral statistical test was used to determine if undetectable proteins are more often mutated in conserved positions than observed proteins.
MIAPE Compliance-The raw MS spectra were uploaded to the EBI's PRIDE repository and are available at http://www.ebi.ac.uk/pride/.

Experimental Design and
Workflow-It is our ultimate goal to study the proteome of the budding yeast life cycle. To establish a suitable method we first sought to determine the complete proteome of diploid budding yeast cells undergoing rapid mitotic growth and division in the presence of glucose (fermentation). To this end, we inoculated two cultures with independent colonies of SK1 MATa/␣ cells and grew them to mid-log phase in rich medium (YPD1 and YPD2). We chose SK1 because it displays normal mitotic growth properties and, as opposed to the reference strain S288c, it undergoes meiosis and gametogenesis efficiently. Moreover, SK1's genome sequence is available (albeit poorly annotated) (22) and we have a large mitotic and meiotic tiling array expression data set for this strain background (21). To maximize protein solubility and peptide detection we prepared extracts using two different buffer systems and then separated the combined protein samples based on their molecular weight via SDS-PAGE. Next, we digested protein fractions present in 30 slices from each of the two lanes with trypsin, and analyzed the peptides with a mass spectrometer during three consecutive rounds of injection; accurate mass exclusion lists of identified peptides were established at each round. Samples were analyzed in duplicate to estimate DIPP's level of reproducibility. Finally, proteome data were interpreted in the context of information on the degree of DNA sequence conservation (25), and DNA mutations such as insertions and deletions (indels) and single nucleotide polymorphisms (22) as well as genome-wide RNA concentrations available for the SK1 strain ( Fig. 1, see Materials and Methods) (21).
The Core Budding Yeast Proteome of Mitotic Growth-According to the SGD (release 18/10/2010), the 16 chromosomes of the budding yeast genome contain 6685 protein coding genes comprising 4864 verified open reading frames (ORFs, including four silenced genes), 910 uncharacterized ORFs, 801 dubious ORFs, and 110 unclassified ORFs (20). We have identified at least once 4952 out of 6685 theoretically predicted proteins (74%) as being present in mitotically growing cells. Importantly, when taking only the verified genes into account, the output of our experiment covers 86% of the predicted yeast proteome (4175 proteins as compared to 4864 ORFs). This suggests that we have achieved essentially complete coverage of the protein profile in fermenting cells because several hundred proteins are involved in processes not included in our analysis, such as haploid-specific pheromone signal transduction and mating, filamentous growth, stress-response, respiration (mitotic growth in the presence of a nonfermentable carbon source), and sporulation (supplemental File S1).
As expected, the vast majority of the proteins identified fall into the class of verified ORFs (4403 and 4069 in YPD1 and YPD2, respectively) but we also detected proteins corresponding to uncharacterized genes (YPD1: 442; YPD2: 394), dubious ORFs (YPD1: 116; YPD2: 138), and unclassified loci (YPD1: 22; YPD2: 18) ( Table I). The surprisingly large number of proteins associated with poorly characterized genes or dubious loci that are not conserved or that overlap with larger validated genes emphasizes that the budding yeast S288c reference genome-15 years after its initial publication (26)-is not yet exhaustively annotated.
A confounding aspect of the yeast genome annotation project is the finding that backgrounds such as SK1 may not only lack genes present in the reference strain (23) but they may also contain protein-coding genes that are missing in S288c (22). To test this idea we investigated 17 hypothetical ORFs absent in the reference strain but present and conserved in the genomes of several S. cerevisiae strains including SK1 (21). Mapping the peptides identified in YPD1 and YPD2 samples confirmed the presence of gene products in all cases to variable degrees of confidence depending on how many peptides were found for each putative protein and how reproducible protein detection was (supplemental File S2). We conclude that comparative genomics is indeed facilitating the discovery of bona fide protein-coding genes in S. cerevisiae and that efforts to identify the full complement of genes present in the budding yeast genome will require information from many different strain backgrounds.

SDS-Gel Based Protein Prefractionation is Robust and
Reproducible-We next explored how efficient and reproducible our simple SDS-PAGE based approach to fractionation of the yeast proteome was. To this end, we plotted the size of the proteins (as the median number of amino acids) over the 30 slices from the top (slice 1) to the bottom (slice 30) of the gel. Somewhat unexpectedly, we observed a negative (albeit reproducible) correlation between molecular weight and migration speed within the top four slices; as opposed to that, a clear and highly reproducible correlation between the molec-ular weight and the migration position was apparent in slices 5-30 ( Fig. 2A). We also found by and large similar numbers of proteins-varying between 400 and 800 -within the two sets of 30 slices (Fig. 2B). These results indicate that although we reached the limit of protein separation via size at the very high      molecular weight range, manually cut slices yielded consistent results throughout the entire range of the running gel. We then asked how efficient proteins were separated based on their molecular weight given that a large amount of protein extract was loaded onto the SDS gel to increase the concentration of peptides injected into the mass spectrometer (see Materials and Methods). Although ϳ1300 proteins were found in one slice each (1252 in YPD1 and 1365 in YPD2), all other proteins were present in more than one slice and around 400 proteins were found on average in 10 -30 slices, with 97 proteins being present in 20 -30 slices (Fig. 3A). The distribution of proteins within slices was found to be highly reproducible (correlation coefficient Ͼ0.89 between YPD1 and YPD2). One likely explanation for this phenomenon was that highly abundant proteins would saturate the gel system. To test this idea we plotted the average number of slices in which a protein was detected against its concentration in molecules per cell (24) and found that cellular protein abundance was strikingly correlated with the tendency of a protein to be detected in more than one band (Fig. 3B). Concordantly, the group of 97 proteins found in 20 -30 slices for which Gene Ontology (27) annotation data were available was significantly enriched statistically for, among others, Translation (p value 1.72 ϫ 10 Ϫ20 ), Glucose metabolic process (1.94 ϫ 10 Ϫ20 ), and Protein metabolic process (6.19 ϫ 10 Ϫ12 ). We conclude that our method is suitable for prefractionation of most yeast proteins and, specifically, that abundant proteins (including those which completely saturate the gel) are prevented from saturating the MS system.
DIPP Yields a Core Protein Complement Across Duplicate Experiments-An important precondition for profiling multiple experimental conditions is to ensure that replicates within a given condition are sufficiently reproducible so that meaningful results can be obtained. To test the robustness of DIPP we compared the output of two independent profiling studies (YPD1 and YPD2) first by taking all predicted ORFs into account. Among 4403 proteins detected in YPD1 and 4069 found in YPD2 we identified 3520 twice whereas 883 and 549 were detected only in YPD1 or YPD2, respectively (Fig. 4A). Among verified ORFs we scored 3823 proteins in YPD1 and 3519 in YPD2 as present; 3167 proteins were detected in both samples whereas 656 and 352 were identified only in YPD1 or YPD2, respectively (Fig. 4B).
Recent work using SILAC has yielded quantitative information on the mitotic proteome in fermenting haploid and diploid cells from the S288c reference strain background (15). In this study, 4386 proteins were identified based on a combination of three different prefractionation strategies. We compared this experiment with our simplified method and found that 3963 proteins were detected in both studies, 423 were reported only by de Godoy et al., and 989 proteins were identified only by DIPP (Figs. 4C and 4D). Furthermore, we compared our results with the output of SRM, a highly sensitive protein profiling method that was applied to groups of proteins present over a very wide range of concentrations: we identified all of ten proteins present at Ͻ128 copies/cell (six were detected twice and four only once), all of five proteins reported to be present at Ͻ50 copies/cell, and all 15 proteins found by SRM but not by quantitative Western blotting (19,24). Moreover, among 15 proteins not identified by SRM but known to be present we detect six twice and one only once (19). Finally, we scored as present seven out of 10 proteins (four twice, three once) found by SRM although no peptide is associated with them in PeptideAtlas (supplemental File S1) (28). Taken together, these results highlight the robustness and sensitivity of DIPP.
There is a clear correlation between our ability to detect proteins and the level of confidence associated with their biological relevance: among 4860 bona fide genes we detected proteins for 3167 loci twice (65%), for 1008 loci once (21%), and for 685 cases (14%) in neither of the samples. For the group of uncharacterized ORFs we found proteins for 296 loci twice (32%), for 244 loci once (26%), and for 370 cases (40%) we failed to detect a protein. This tendency is even more apparent in the group of dubious ORFs: among 801 cases we find proteins twice for 47 ORFs (6%), once for 160 ORFs (20%), and never for 594 ORFs (74%). Likewise, among 114 cases of unclassified or silenced loci we detected proteins twice for 10 ORFs (9%), once for 20 ORFs (18%), and in none of the samples for 84 ORFs (73%) ( Table II; Fig. 5).
The corollary is that DIPP of fermenting diploid cells reproducibly detects approximately two thirds of the proteins encoded by validated genes and one third of the proteins encoded by uncharacterized loci. Furthermore, the vast majority of dubious ORFs do not seem to encode proteins expressed to a level allowing for their detection in asynchronously growing cells.
Iterative   uration of the MS system. A key element of our method apart from protein prefractionation is to consecutively filter the most abundant peptides that obscure MS signals during iterative rounds of injections. 4358 out of 4952 proteins (88%) are detected after the first injection. However, we found 418 additional proteins during the second round and 176 proteins during the third round (Fig. 6). The data indicate that our approach facilitates the production of interpretable spectra and thereby increases the number of proteins detected. Moreover, we also found that iterative injections increases the reproducibility of our data: with a single injection 65% of the proteins identified are found in both replicates whereas this was the case for 71% after three rounds of iterative injections. Interestingly, a pilot experiment to the present study has shown that a triplicate injection of the same yeast extract without the use of an exclusion list strategy, only lead to an ϳ4% increase in the number of proteins identified (data not shown).

Information About DNA Mutations and RNA Expression May Help Predict Protein
Stability-An intriguing outcome of our experiment was that 1733 predicted proteins, including 685 encoded by bona fide genes, were not detected in the replicate DIPP analyses. We therefore set out to explain their absence by integrating proteome data with information on DNA variations between S288c and SK1 strains and RNA expression profiles as determined by microarrays in the SK1 strain background (Fig. 7A) (21,22). 132 genes (8%) were found to be entirely deleted and an additional 258 (15%) lack at least one fifth of their primary sequence because they were partially deleted or because they contained variations that created stop codons leading to the translation of C-terminally truncated proteins in the SK1 background. As expected, C-terminal deletions are mostly small in stable proteins (those identified by DIPP) whereas they are frequently large in the case of proteins not identified (supplemental Fig. S3). Among the genes for which no protein was detected we found 440 dubious loci (25%); this is consistent with the profiling data because they are typically (albeit not exclusively (29)) annotation artifacts that are not expected to encode proteins. Furthermore, for 152 genes (9%) we did not observe expression signals above the threshold level of detection in a tiling array experiment (21) indicating that they are transcriptionally repressed in diploid cells undergoing mitotic growth in rich medium. In the remaining 751 cases (43%) we measured mRNA concentrations above the threshold level, which is consistent with the notion that these genes are post-translationally regulated. Coherently, many of the genes in this group are involved in inducible biological processes such as Response to stimulus (20%), Transport (19%), Meiosis/Sporulation/Conjugation (5%), or Filamentous growth (2%) (Fig. 7B).
We next investigated if the group for which mRNAs but no proteins were found in the cells showed a tendency to contain point mutations in conserved amino acids that might destabilize them. Among 751 cases, we found 247 proteins whose sequences were identical in SK1 and the reference strain S288c and for 63 "orphan" proteins where we were unable to determine sequence conservation because their genes appeared to have no fungal homolog (details in Materials and Methods). Among the remaining 441 cases we found that 335 proteins displayed amino acid substitutions in SK1 exclusively at variable positions, and 106 (24%) proteins contained mutations in at least one highly conserved position (Fig. 7C). Critically, when we determined the frequency of these types of mutations in a randomly selected control group of 441 stable proteins that were detected, we found only 77 proteins (17%) with mutations of highly conserved amino acids. A bilateral test revealed this difference to be statistically significant (confidence interval 0.05, see Materials and Methods). Overall these results suggest that the DNA variations between the reference yeast strain S288c and SK1 may in part explain protein instability observed in SK1. We have initiated experiments to test this idea. A color coded bar diagram shows the number of proteins (y axis) identified during three rounds of injections (x axis) in both samples (light green) or in either YPD1 (dark green) or YPD2 (yellow) samples. DISCUSSION We have developed DIPP, a novel robust and straightforward method to profile the proteome in simple eukaryotes and employed it to study mitotic growth in the presence of glucose (fermentation) in the budding yeast S. cerevisiae. DIPP requires only basic equipment for culturing cells, and for processing protein extracts. Peptides were detected using the LTQ-OrbiTrap mass spectrometer-currently the most popular platform in proteomics-and an innovative approach based on iterative rounds of injection followed by masking detected peptides. Critically, our method does not need cell labeling, which means it is suitable for profiling the proteome under all conceivable experimental conditions, including those that entail metabolic changes that interfere with efficient SILAC labeling. We detected essentially all of the known proteins and the vast majority of them were found in duplicate samples, indicating that DIPP is likely efficient enough to carry out meaningful large-scale protein profiling experiments across distinct culture conditions.
A key question is how to increase the protein yield without saturating the system? In a pilot study analyzing a total protein extract in a single round of injections into a hybrid LTQ-OrbiTrap XL mass spectrometer we identified only ϳ5% of the predicted proteome, and using only one extraction buffer also yielded suboptimal results (data not shown). Protein fractionation was thus, unsurprisingly, found to be a critical step in large-scale profiling using current MS technology; however, it is tedious and costly in terms of equipment, reagents, and man hours. We therefore sought a simplified solution and found that using mechanical disruption of frozen samples and two buffers with distinct chaotropic properties helped recover proteins over a broad range of solubility (including many genes annotated as encoding membrane proteins). Rather than being analyzed separately, the protein solutions were then mixed prior to high-resolution SDS-gel prefractionation followed by in-gel digestion and iterative injection into the MS system.
Over the past years, shotgun proteomics emerged as a key method in the field (30 -32) and various strategies were employed to tackle complex samples including using different MS instruments (33)(34)(35), new fragmentation techniques (36), inclusion lists (37), or repeated sample injections (38). However, none of these methods was as efficient as the approach based on peptide mass exclusion lists (39). A critical aspect of iterative mass spectrometry analysis is indeed that already detected peptides are masked during the consecutive step thereby rendering protein detection more effective. To improve our ability to detect proteins we optimized the standard shotgun liquid chromatography-tandem MS (LC-MS/MS) approach using accurate masses and retention time of identified peptides to establish such exclusion lists (39). Our results show that after establishing the first list of identified peptides, the second injection yields a substantial number of proteins not found in the first round whereas the third injection produces a smaller yield probably indicating a plateau effect. It is unclear how far the method could be extended but preliminary results seem to suggest that a fourth round of injection does not lead to the detection of a sufficiently large number of proteins to justify the cost and effort (R. Lavigne and C. Pineau, unpublished). A key question is whether the improvement in protein identification rate is because of the exclusion list strategy rather than chance sampling. It is acknowledged that repeated injections of the same sample improve the protein identification coverage by about 10%. However, this finding is genuinely relevant only for proteome samples of low and medium complexity. In the case of a yeast total cell lysate, it is difficult if not impossible to increase the number of proteins identified without peptide mass exclusion lists. Indeed, a previous study of the yeast proteome using cell lysates shows that at least 10 replicate injections of the same sample are necessary to cover the proteome roughly as extensively as DIPP (32). In this context it should be noted that Piening et al. (38) needed as many as 31 consecutive LC-MS/MS analyses of a yeast cell lysate to reach the number of unique peptides that was identified in a similar sample using only six injections and a mass exclusion-based DDA strategy (i.e. 4550 versus 4490, respectively) (39).
An intriguing outcome of our analysis is that DIPP appears to be very robust and extremely sensitive: we identified proteins such as Erg20, Gcy1, Num1, Pdi1, and Uga2, which were thought to be detectable only by organelle-specific proteomics or extremely elaborate protein fractionation techniques and MS-based peptide detection methods with a threshold as low as 50 molecules per cell (19,24).
Why do we fail to detect 1733 predicted proteins in the proteome of diploid fermenting cells? One reason is that the SK1 genome contains deletions that remove entire genes and their products and DNA variations that lead to the synthesis of truncated proteins that are likely unstable and subject to rapid degradation for example via the unfolded protein response (22,40). Although complete or partial deletions obviously provide an excellent explanation for the absence of a protein-provided that our diploid SK1 strain is homozygous for these mutations originally defined in a haploid SK1 background (22)-nonsynonymous mutations represent a weaker, yet still plausible explanation. It is noteworthy in this context that proteins not detected by DIPP are more frequently associated with mutations affecting highly conserved amino acids than a randomly selected group of proteins that were detected at least once. In this context it should be noted that numerous ORFs among the 1733 cases might be annotation artifacts (especially the dubious ones) not encoding functional proteins (25,41).
The absence of a protein in extracts from cells cultured in YPD may also be because of the fact that many genes involved in processes such as stress-response, filamentous growth, and gametogenesis are transcriptionally repressed or post-translationally regulated during mitotic growth (17,(42)(43)(44). Furthermore, it is possible that loci that are transcribed to a level detectable by microarrays may encode proteins that are particularly unstable in vegetatively growing cells (45). Other proteins may escape our detection system because they are not soluble under the conditions we used or they are too small to be captured on the SDS gel system we employed. We note that the latter issue is likely not critical because we detected 669 proteins of less than 150 amino acids at least once. Other potential issues are that peptides may ionize at a low frequency or not at all, that hydrophobic peptides can suppress the ion signal of hydrophilic peptides, and that highly concentrated peptides can mask the ion signals of less abundant ones, making it impossible to detect them (39,46,47). It is conceivable that increasing the number of slices beyond 30 may lead to the detection of additional proteins. However, given the fact that the current DIPP approach detects the vast majority of the proteins expected to be present in mitotic cells growing under optimal conditions it is unclear if a potentially marginal improvement justifies the additional cost and labor.
The output of our profiling study is mostly coherent with the level of confidence attributed to different classes of annotated genes because most proteins we find in both samples fall into the group of verified ORFs whereas predicted proteins consistently absent are often encoded by dubious genes (see supplemental File S1). We did, however, detect a surprisingly large number of proteins that appear to be associated with loci that do not fulfill the classical criteria for a bona fide yeast gene (such as a minimal number of codons, lack of overlap with another gene on the opposite stand, and sequence conservation). It is safe to assume that efforts to comprehensively annotate the budding yeast genome will require the output of comparative genomics as well as RNA-, and protein profiling work.
A critical issue of DIPP is its limited ability to quantify protein concentrations. Current MS data yield indirect information about abundance via the number of peptides associated with a given protein but this measure is imprecise. As a consequence DIPP is not a quantitative method but this is compensated by the fact that it is applicable to a large number of experimental conditions, which may be difficult or impossible to study by approaches based on SILAC. Moreover, its workload is comparatively moderate putting proteomics on a large scale within the reach of many laboratories with access to standard MS equipment. Finally, we are currently implementing a new approach known as peptide-count or absolute quantification that will help quantify protein concentrations based on how often a given peptide was detected; this will further enhance the analytical power of our method and open up the avenue for the yeast field to rapid, cost-effective and robust analysis of a very wide range of experimental conditions akin to those that have been studied with microarrays for the past 16 years.