Toward a Comprehensive Atlas of the Physical Interactome of Saccharomyces cerevisiae*S

Defining protein complexes is critical to virtually all aspects of cell biology. Two recent affinity purification/mass spectrometry studies in Saccharomyces cerevisiae have vastly increased the available protein interaction data. The practical utility of such high throughput interaction sets, however, is substantially decreased by the presence of false positives. Here we created a novel probabilistic metric that takes advantage of the high density of these data, including both the presence and absence of individual associations, to provide a measure of the relative confidence of each potential protein-protein interaction. This analysis largely overcomes the noise inherent in high throughput immunoprecipitation experiments. For example, of the 12,122 binary interactions in the general repository of interaction data (BioGRID) derived from these two studies, we marked 7504 as being of substantially lower confidence. Additionally, applying our metric and a stringent cutoff we identified a set of 9074 interactions (including 4456 that were not among the 12,122 interactions) with accuracy comparable to that of conventional small scale methodologies. Finally we organized proteins into coherent multisubunit complexes using hierarchical clustering. This work thus provides a highly accurate physical interaction map of yeast in a format that is readily accessible to the biological community.

siae, these physical connections have been defined in large scale experiments using the yeast two-hybrid method (1,2) as well as direct purification of complexes using affinity tags followed by mass spectrometry analyses. In 2002, two initial studies utilized the latter strategy on subsets of the proteome (3,4). Ho et al. (4) used an overexpression strategy combined with a single affinity purification step, whereas Gavin et al. (3) used a tandem affinity purification (TAP) system in which epitope-tagged proteins were expressed under normal physiological conditions. The use of an overexpression system may facilitate detection of weaker or more transitory associations between proteins or protein complexes but might be less optimal for accurate definition of stoichiometric interactions. Indeed the purification of proteins expressed under normal physiological conditions followed by mass spectrometry provided the best coverage and accuracy for detection of stable protein complexes (5). Based on these considerations, two separate groups interrogated the physical interactome of S. cerevisiae using this strategy (6,7).
Although a similar approach was used for protein purification and identification, the resulting datasets were subjected to different analytical methods to define PPIs and protein complexes. Gavin et al. (6) exploited a "socio-affinity" scoring system that measures the log-ratio of the number of times two proteins are observed together relative to what would be expected from their frequency in the dataset. Importantly this approach takes advantage of not only direct bait-prey connections but also indirect prey-prey relationships where two proteins are each identified as preys in a purification in which a third protein is used as bait. Krogan et al. (7), on the other hand, used a synthesis of machine learning techniques including Bayesian networks and C4.5-based and boosted stump decision trees to define confidence scores for potential interactions based on direct bait-prey observations. The two groups also used different clustering algorithms to define protein complexes from their PPI datasets. For example, Krogan et al. (7) used a Markov clustering algorithm (8) for definition of protein complexes, whereas Gavin et al. (6) utilized a different clustering approach to define complexes, each consisting of groups of proteins termed "core,""module," or "at-tachment". Modules were intended to represent subcomplexes that are components of several distinct complexes, and attachments were factors less stably associated with stable core complexes. Although both of these individual datasets are of high quality, it is not obvious how discrepancies between them should be resolved, and each still contains a substantial number of false positive interactions that can compromise the utility of these data for guiding more focused studies.
In this study, we merged these two datasets into a single reliable collection of experimentally based PPIs by analyzing the primary affinity purification data using a novel purification enrichment (PE) scoring system. Using a well defined reference set of manually curated PPIs, we demonstrated that our consolidated dataset is of greater accuracy than the individual sets and is comparable to PPIs defined using more conventional small scale methodologies. Although algorithms designed to detect multiprotein complexes can be highly effective for extracting additional information from noisy and incomplete datasets, attempting to strictly define protein complexes may not be the optimal way to analyze such a high confidence dataset. In particular, any partitioning analysis must either group together distinct complexes that share one or more subunits or fail to correctly identify all of the components of such complexes. Additionally weak interactions between proteins or protein complexes may be lost. In this work, we subjected the entire high confidence PPI dataset to a relatively unbiased hierarchical clustering from which one can more easily identify shared components of distinct complexes as well as weak associations between complexes. We argue that this representation provides a convenient tool for biologists to gather information about a protein of interest rapidly. Finally this depiction potentially mimics the in vivo environment: a continuum of weak associations between stable protein complexes.

EXPERIMENTAL PROCEDURES
Calculation of Purification Enrichment Scores-PE scores were modeled after a discriminant function for a Bayes classifier (9) as a measure of the likelihood of observed experimental results given the hypothesis that an interaction is genuine relative to the likelihood of the same results if the interaction is not real. These scores incorporate ideas from the socio-affinity scoring system reported by Gavin et al. (6) but differ in several significant ways. First, these scores take into account not only positive evidence for an interaction contained in the identification of two proteins in the same purification but also negative evidence against interactions wherein one protein fails to be identified as a prey when another is used as a bait. This negative evidence has typically not been used in previous interaction scoring techniques, and it can be particularly useful for distinguishing non-interacting pairs of proteins that share many interaction partners from pairs that do exist in stable complexes. Second, PE scores more powerfully exploit situations in which a particular bait protein was used in multiple separate purifications. Third, the PE scoring strategy uses a different model for the likelihood of observing a pair of proteins in the same purification if these proteins do not interact. PE scores were motivated by the probabilistic framework of a (naïve) Bayes classifier. In a Bayes classifier, an estimate of the probability of one hypothesis (here that an interaction is real) relative to the probability of a second hypothesis (here that the interaction is not real), given a set of observations, is calculated to determine which hypothesis is more likely. Both of these probabilities are calculated using Bayes' theorem, and a discriminant function f is calculated as the log-ratio of these probabilities. An interaction is classified as real if f Ͼ 0 and false if f Ͻ 0 (9). The function f is defined as f͑all_observations͒ ϭ log 10 P͑all_observations͉true_PPI͒ ϫ P͑true_PPI͒ P͑all_observations͉false_PPI͒ ϫ P͑false_PPI͒ (Eq. 1) where P(true_PPI) and P(false_PPI) represent prior expectations for the fraction of all protein pairs that do and do not interact physically. The above equation can be rewritten as follows.
f͑all_observations͒ ϭ log 10 P͑true_PPI͒ P͑false_PPI͒ Although the accuracy of a Bayes classifier will rely on an appropriate value for P(true_PPI) and the correct value is not obvious, an incorrect choice of this value will not affect the ordering of scores for putative interactions. We therefore computed PE scores as a sum of the evidence supporting or disaffirming each potential interaction over all relevant purifications in the dataset. For a particular observation, this evidence was computed as an estimate of the corresponding term in the above sum.
Evidence observation ϭ log 10 P͑observation͉true_PPI͒ P͑observation͉false_PPI͒ (Eq. 3) A PE score of 0 then indicates that no evidence for or against the validity of a particular interaction was collected (and in theory the probability that such an interaction is true should be equal to the prior estimate of P(true_PPI)). In particular, we considered two types of observations in the construction of PE scores: bait-prey observations when one of the proteins of interest was used as a bait and prey-prey observations when the two proteins of interest both appeared as preys in the purification of a third protein. As a result, similar to socio-affinity scores (6), PE scores can be written as a sum of direct bait-prey components (S) and an indirect prey-prey component (M). Thus, for a potential interaction between proteins i and j, where S ij measures evidence from purifications where protein i was used as bait, S ji measures evidence from purifications where protein j was used as bait, and M ij measures indirect evidence due to cooccurrence of proteins i and j as preys in the same purifications. Below we give detailed equations used to compute the S and M components, where each value of k indicates a distinct purification in which protein i was used as bait and s ijk represents the corresponding evidence computed using Equation 3. The probabilities P(observation ͉ true_PPI) and P(observation ͉ true_PPI) used to define s ijk were calculated based on estimates of two underlying probabilities: r repre-senting the probability that a true association will be preserved and detected in a purification experiment and p ijk representing the probability that a bait-prey pair will be observed for nonspecific reasons. Using these quantities, we calculate s ijk ϭ log 10 r ϩ ͑1 Ϫ r͒ ϫ p ijk p ijk (Eq. 6) if protein j appeared as a prey in purification k using bait i and otherwise. Values for r and p ijk could in principle be estimated in a number of ways. Here we estimated r using the observed frequency of successful purification over a very high confidence set of interac- where n ik prey is the number of preys identified in purification k with bait i, n i bait is the number of times protein i was used as bait, and f j is an estimate of the nonspecific frequency of occurrence of prey j in the dataset. The relative values of the f j are estimates of relative rates at which different preys occur nonspecifically (and can be considered measures of relative promiscuity), and the sum of the f j can be considered to be the fraction of all prey identifications that are nonspecific. Although alternate strategies could be used, for simplicity we allowed the sum of the f j to be 1, and we computed f j as Bayesian posterior estimates based on the observed frequency of occurrence of preys in the dataset and the prior hypothesis that all preys occur nonspecifically with equal frequency, f j ϭ n j prey_obs ϩ n pseudo n tot prey_obs ϩ ͑n distinct_preys ϫ n pseudo ͒ (Eq. 9) where n j prey_obs is the total number of observations of protein j as a prey, n tot prey_obs is the total number of observations of all preys, n dis -tinct_preys is the number of distinct preys observed, and n pseudo is a number of pseudocounts added for each prey that determines the weight given to the prior hypothesis. Values of 20, 10, and 5 were used for n pseudo for the Krogan et al. (7), Gavin et al. (6), and Ho et al.
(4) datasets, respectively. The value of n pseudo was the only parameter adjusted to optimize the PE scoring system. Adjustments were done using the MIPS complexes as a reference, and for this reason results of all comparisons made using a reference set based on the MIPS complexes were duplicated using an independent reference set generated from the SGD complexes.
The M component was calculated as where each value of k indicates one purification in which proteins i and j were simultaneously observed as preys. In this case, our approach differs slightly from the full Bayesian classifier approach, which would either sum over all purifications or sum over all purifications in which at least one of the two proteins was identified as a prey. We did not use a sum over all purifications because it would require an enormous number of calculations and because estimation of all of the relevant probabilities is itself a very difficult problem. We instead created an approximate implementation of Equation 3 for m ijk calculated only for observations where both preys were observed in the same purification. Significantly we did not include a negative term for the case in which only one of the two proteins was observed as a prey in a purification. This was because two proteins can interact yet also be components of alternate complexes. Our implementation was again based on estimates for two underlying probabilities. Here we used r to represent the probability that a true association between proteins i and j will be preserved and detected during a purification experiment and p ijk to represent the probability that proteins i and j will appear as preys in the same purification for nonspecific reasons.
We used the same estimate for r as calculated above, and for p ijk we used an estimate of probability that proteins i and j will occur nonspecifically as preys in the same purification at least once in the dataset. This value for p ijk is calculated using the Poisson distribution as where f i and f j are computed as described above, and n tot prey-prey is the total number of prey-prey pairs observed in the dataset.
The Krogan et al. (7) and Gavin et al. (6) data were combined by computing a score for each putative interaction independently over each dataset and adding them as follows.
This weighted sum was used instead of a straight sum because empirically it was a more effective predictor of PPIs, and in practice this may be due to redundancy of the Krogan et al. (7) LCMS-MS and MALDI-TOF data.
Clustering of PPI Data-First, scaled PE scores were computed for use in hierarchical clustering to minimize variation in scores that does not correspond to variation in the reliability of the represented interactions. For example, PE scores of 10 and 20 may both correspond to extremely reliable interactions, but a score of 0 likely indicates a non-interaction. The scaled scores range from 0 to 1 and were intended to approximate confidence values (i.e. a scaled score of 0.8 would correspond to 80% likelihood of a true interaction). However, these values were not carefully trained and should not be taken as reliable confidence values. Equations used for calculating these values are detailed below. A vector of scaled PE scores was then created for each protein that had at least one scaled score of 0.2 or higher (corresponding to a PE score threshold of 1.85). A value of 1 was assigned for the diagonal elements (representing self-interaction) so that interacting proteins would tend to cluster together. These data were then hierarchically clustered using the uncentered correlation metric and the average linkage method with the Cluster 3.0 program (10). Results were visualized, and figure images were created using the Java Treeview program (50).
Scaled scores represent a monotonic mapping of PE scores onto the interval 0 -1. They would represent confidence values given the approximations that 1) binary interactions in MIPS complexes represent an unbiased subset of the set of all true binary protein-protein interactions, 2) MIPS small scale experiments are ϳ95% accurate, and 3) the set of MIPS complexes is independent of the results contained in MIPS small scale experiments. They were computed using the slope of a "coverage curve" of the cumulative number of interactions detected that were annotated in MIPS complexes versus the total number of interactions identified (see Supplemental Fig. 3). For each PE score, a corresponding slope in the coverage curve was computed by local linear regression. The resulting slopes were made monotonic (as a function of PE score) and smoothed using the pool adjacent violators algorithm (11) and LOESS regression (12). To convert these slopes to scaled scores, they were divided by the fraction of interactions included in the MIPS small scale experiments (excluding two-hybrid studies) that were also contained in the MIPS complexes (461 of 1081). The resulting values were multiplied by 0.95, and an upper bound of 0.99 was applied. Scaled scores below 0.05 were set to 0 for computational expediency.
Gene Ontology (GO) and GOslim Annotations-GO (13) and GOslim annotations were obtained from SGD (14) on March 7, 2006. Any feature annotated as ORF, pseudo_gene, or transposable_element-_gene in SGD was used to calculate the total number of proteins in each GOslim category.
MIPS Small Scale Experiments-A collection of 1081 putative protein-protein interactions identified in small scale experiments was obtained from the MIPS database on March 7, 2006 (15). Two-hybrid experiments were excluded from this set because they appeared to be of lower accuracy. The collection from MIPS was used rather than the larger collection contained in the BioGRID database (16) because the collection in MIPS appeared to be of greater accuracy by each of the metrics we considered.
True Positive and True Negative Calculation-True positives were calculated for PPIs within complexes (for MIPS and SGD). True negatives were taken to be connections between proteins in different complexes if the proteins have a different subcellular localization according to Huh et al. (17) and Kumar et al. (18) or show significant mRNA expression anticorrelation (calculated using a standard correlation coefficient, distance Ͼ1.108328 (corresponds to R Ͻ Ϫ0.108328 or a P Ͻ 0.001) over a set of 1000 microarray experiments (19)).
Receiver Operating Characteristic (ROC) Curve Calculations-ROC curves were calculated using PE (and in some cases socio-affinity) scores calculated for all pairs of proteins in the full reference set. Thus a sensitivity value of 1 indicates detection of all true positive examples in the reference set, and a 1 Ϫ specificity value of 1 indicates detection of all true negative examples in the reference set. For all ROC curves plotted on the same graph, an identical reference set was used to calculate the curves.
Supporting Website and Database-A searchable website, which contains all the PE scores and PPI clustering, has been created at interactome-cmp.ucsf.edu using Perl, hypertext preprocessor, and a PostgreSQL relational database.
Diploid Bimater Assay-To compare yol054w⌬/yol054w⌬ cells to wild type, 1-cm 2 patches of each were made from independent single colonies, replica-plated to a lawn of tester cells, cultured for 6 h at 30°C, and again replicated to medium selective for rare matings (20). The number of colonies on each patch was counted manually with the median number of colonies on each patch being used to calculate -fold change (mutant/wild type ratio). Selection was based on histidine prototrophy because experimental genotypes were MATa/ MAT␣, his3⌬/his3⌬ (control) or MATa/MAT␣, his3⌬/his3⌬, yol054w⌬/ yol054w⌬ (experiment), and the mating testers were MATa his1 or MAT␣ his1. a-like Faker Assay-To compare MAT␣ yol054w⌬ his3 to MAT␣ his3, 1-cm 2 patches from independent single colonies were replica-plated to medium selective for rare matings based on histidine prototrophy as above (21).

A Metric for Defining Protein-Protein Interactions-
The recently completed high throughput affinity purification experiments provided hundreds of thousands of putative PPIs. The challenge was then to convert this array of affinity purification data into a set of high confidence PPIs. Due to the high density nature of these studies, there are often many separate observations that provide evidence supporting or disaffirming a potential interaction as well as a significant amount of experimental noise intrinsic to high throughput affinity purification approaches. Clearly and as appreciated in the original studies (6, 7), a simple cataloguing of observed associations does not adequately exploit these data. Instead one would like to integrate all of the data in a uniform manner to fully exploit direct evidence for interactions where one protein was used as a bait and another was identified as a prey, indirect evidence due to the co-occurrence of a pair of preys in identical purifications, and evidence against the validity of an interaction when one protein was used as bait and the other was not identified as a prey. This problem can naturally be cast in terms of Bayesian statistics where one can quantify the evidence that each relevant observation provides for or against the validity of an interaction in terms of the probability of making such an observation if the interaction is true and the probability if the interaction is not true.
Evidence observation ϭ log 10 P͑observation͉true_PPI͒ P͑observation͉false_PPI͒ (Eq. 14) Motivated by this framework, we created a novel metric, which we term the PE score. For each putative interaction, this score is a sum of the evidence calculated for each relevant observation in a dataset (detailed equations are provided under "Experimental Procedures"). By several independent metrics including the ability to predict membership in previously annotated complexes, the PE scores appear to identify interactions of higher confidence than the socio-affinity scores of Gavin et al. (6) (Supplemental Fig. 1). PE scores also performed better than scores that only took advantage of the direct bait-prey data from purification experiments (Fig. 1A, "Krogan PPI" point and data not shown). The use of indirect prey-prey information was also a component of the socio-affinity score, and it is conceptually related to a computational approach taken to predict PPIs based on shared interaction partners (22). Although it is clear from those studies (and our own) that there is a wealth of information contained in inferences from indirect prey-prey associations, some care should be taken with interactions inferred solely in this way as it appears that incorrect linkages may occasionally be inferred between proteins sharing a large number of common interaction partners. For this reason, we preserved annotations indicating which interactions were and were not observed directly (see below). We also note that, given a set of purification results, a PE score can be computed for any pair of proteins including, but not limited to, pairs of proteins for which direct or indirect evidence for an interaction was observed. Pairs that never co-purified will either be assigned scores of 0 (if neither protein was used as a bait) or negative scores, indicating that evidence against the potential interactions was collected. Finally it is important to be aware that the negative interaction data may exhibit some bias with respect to tagging artifacts, protein abundance, and mass spectrometry issues; however, we found that including this information in the analysis increases the quality of the final dataset.
Assessing Confidence of Binary Interactions-A standard method to evaluate the accuracy of a scored interaction dataset is to measure it against a high confidence reference set that is taken to be correct (22). For the calculated PE scores as well as previous mass spectrometry-based datasets, we evaluated accuracy and coverage using a reference set of true positive and true negative interactions generated from manually curated complexes obtained from either MIPS (15) or SGD (14) (Supplemental Fig. 2 and see "Experimental Procedures" for a more detailed description). True positive interactions were taken to be connections between proteins that were annotated as belonging to the same complex in the database (MIPS or SGD). Although such a reference set will contain some false positives, this contamination is unlikely to be biased in favor of a particular dataset. Generating an unbiased set of non-interacting pairs of proteins, or true neg- ative interactions, is more challenging. Nevertheless our results did not seem to be particularly sensitive to the method used to define this set. We defined our set of true negative interactions to be connections between pairs of proteins that were annotated only to distinct complexes and that either had non-overlapping cellular localizations as determined by green fluorescent protein fusion studies (17,18) or had significantly anticorrelated mRNA expression patterns. Although the localization and co-expression criteria we applied will each have their own biases, they both largely deplete known interactions from the true negative set (17,23). With reference sets constructed, we could measure the relative accuracy and coverage of different datasets by creating ROC curves that measure the tradeoff between accuracy and completeness as a function of a score threshold (12). We found that when the PE metric was applied to either the new Gavin et al. (6) or the Krogan et al. (7) primary co-precipitation data it was possible to identify a substantial (although non-identical) fraction of known protein complexes while excluding the vast majority of the true negative set ( Fig. 1A and Supplemental Fig. 2). Application of the PE score to the co-precipitation data in Ho et al. (4) was significantly less successful at identifying known PPIs (Supplemental Fig. 2), although the difference may be largely due to the smaller quantity of raw data in this dataset. In each of the ROC curves, there is a significant portion of the curve that is linear and has a slope similar to that of the random background. This trend is due to interactions in the reference set that were neither supported nor disaffirmed by the dataset and received scores of 0.
A High Confidence Consolidated Dataset-Subjecting the Gavin et al. (6) and Krogan et al. (7) datasets to the same PE log-likelihood scoring function allowed us to directly combine them into a single comprehensive set that encompasses all of the high throughput TAP purification experiments completed to date. We computed combined scores from both the Krogan et al. (7) and Gavin et al. (6) datasets (see "Experimental Procedures" for detailed equations), and not surprisingly, this consolidated dataset provided greater coverage and accuracy than either of the individual datasets ( Fig. 1A and Supplemental Fig. 3). In particular, it was possible to capture ϳ50% of the previously reported interactions within protein complexes, although the true coverage may be substantially higher because this reference set likely still contains false positives. We chose not to include the Ho et al. (4) data in our consolidated dataset because it was created using a different experimental method, and its inclusion resulted in negligible changes to the resulting ROC curves (data not shown).
Using the true positive and true negative sets of protein pairs described above not only allowed us to compare the processed results of this consolidated dataset to previous high throughput datasets, but it also provided an opportunity to compare our new results with those obtained in small scale experiments that are often taken as a standard for high accuracy (24,25). Consistent with earlier analyses, we found that previous high throughput efforts did not reach the level of accuracy obtained in small scale studies (25). However, using the consolidated dataset, it was possible to define a large set of PPIs with the same calculated true positive to true negative rate as the collection of 1081 pairwise interactions obtained from small scale experiments (excluding two-hybrid studies) in the MIPS database ( Fig. 1A and Supplemental Fig. 4). This true positive to true negative rate suggests a score threshold (of 3.19) that defines a set of 9074 high confidence interactions among 1622 distinct proteins. Consistent with an earlier analysis based on smaller protein-protein interaction networks (26), we found that this network, which is probably enriched for stable interactions relative to more transient ones, is not scale-free (i.e. although the network contains a substantial number of nodes with high degree, the node degree distribution is not described by a power law) (Supplemental Fig. 5).
The suggestion that this subset of 9074 interactions from the consolidated dataset is of comparable confidence to that of a manually curated set of interactions identified in small scale experiments was tested by three additional independent measures: subcellular co-localization, GO annotation, and mRNA co-expression. First because proteins that interact physically tend to have the same subcellular localizations (17,18,25), we compared the published experimentally determined localizations of the putatively interacting protein pairs. Unlike pairs identified in previous high throughput studies, we found that pairs in this high confidence set were more likely to have matching localizations than pairs identified in small scale experiments (Fig. 1B). Next we found that three different classes of GO annotations (cellular component, biological process, and molecular function) were either equally or more likely to match for pairs of interacting proteins in our new set compared with pairs derived from small scale experiments (Fig. 1B). Finally it is known that genes encoding physically interacting proteins are more likely to have similar expression profiles (10,23,27,28), and so we examined the distribution of Pearson correlation coefficients between expression patterns of interacting pairs over a set of 1000 previously published microarray experiments (19). Relative to the pairs identified in small scale experiments, our new high confidence set is significantly enriched for gene pairs with highly similar expression patterns (Fig. 1C and Supplemental Fig. 4). Although this enrichment may reflect better coverage of the ribosome and proteins involved in ribosome biogenesis, the new high confidence set also shows an almost identical lack of anticorrelated gene pairs when compared with the small scale set ( Fig. 1C and Supplemental Fig. 4), providing further evidence that the consolidated set of PPIs has a very low false positive rate that compares favorably to that of the MIPS small scale dataset.
Comparison of the PPIs generated in this study with ones deposited into BioGRID (16) (which is a primary source for SGD (14)) from the original studies clearly demonstrates that we have defined a more reliable dataset (Fig. 1A). In particular, the 4456 PPIs unique to our set appear to be of confidence comparable to that of the small scale experiments, whereas those unique to either the Gavin et al. sets deposited in the databases appear to be of markedly lower confidence as judged by cellular localization and GO annotation (Fig. 2). It should be noted that using the socio-affinity scoring system described by Gavin et al. (6) provides a dataset that, although of lower coverage and accuracy than the new datasets we define here, is of higher confidence than the set deposited in the major databases (Supplemental Fig. 1). We also note that although in general they should be considered of lower confidence, the interactions unique to the Gavin et al. (6) or Krogan et al. (7) sets are still likely to contain a number of physiologically relevant associations. The high confidence set of interactions defined here, similar to other PPI datasets derived from high throughput studies (5-7), shows some apparent bias toward high abundance proteins and against proteins from certain cellular compartments (such as the cell wall and the plasma membrane) (Supplemental Fig. 6). These biases probably reflect experimental limitations but may also to some extent reflect real features of the distribution of protein complexes in yeast.
A Portrait of the Physical Interactome Map-Several methods to accurately define protein complexes were explored using the high confidence consolidated PPI dataset. Using such analyses as a final representation, however, often results in unwanted consequences such as the merging of several clearly distinct complexes that share one or more subunits. Also information regarding weak associations between protein complexes can be lost. To overcome these difficulties and in an attempt to visualize the physical interactome as it exists in vivo, we subjected the patterns of PPIs for all proteins having at least one interaction with a scaled PE score (see "Experimental Procedures") above 0.20, a criterion encompassing almost 2400 proteins, to hierarchical clustering (Fig.  3A) (see "Experimental Procedures" for a more detailed description). The threshold used here, which is lower than the one used above to define 9074 high confidence interactions, was used to allow a more complete interaction map. Stable, stoichiometric protein complexes are, for the most part, accurately recapitulated as distinct blocks along the diagonal, whereas PPIs that reside off the diagonal either represent shared subunits of complexes or weak associations between complexes (Fig. 3, B and C). A clear example of the former emerges from four complexes that are involved in chromatin function: NuA4 (29 -31), SWR-C (31-33), INO80C (34), and the helicase chaperone complex Tah1-Pih1 (35) (Fig. 3D). Visual inspection of the off-diagonal connections demonstrates that the DNA helicases Rvb1 and Rvb2 are components of the INO80C, SWR-C, and Tah1-Pih1, but not NuA4, protein complexes (Fig. 3D). Similarly Swc4 and Yaf9 are shared components of SWR-C and NuA4, whereas the actin-related proteins Act1 and Arp4 are part of SWR-C, NuA4, and INO80, but not the Tah1-Pih1, complexes. Further inspection of the off-diagonal connections (Fig. 3B) reveals that Tra1 is a shared subunit of SAGA (36) and NuA4 (29), Taf14 resides both in TFIIF and INO80C (37), and actin (Act1) physically associates with several factors involved in cytoskeleton formation in addition to being a subunit of multiple chromatin remodeling complexes (38). A different region of the clustergram nicely demonstrates that Sec13 is part of both the Nup84 nucleoporin (39) and the coatomer COPII complexes (40) (Fig. 3E). Further inspection reveals that Sec23, a component of the COPII complex, seems to be independently associated with the three members of the Sec24 family, Sec24, Sfb2, and Sfb3, a phenomenon that has been characterized previously (41) (see interactome-cmp.ucsf.edu for more comprehensive views of the clustergrams).
Using a two-color scheme overlaid on the clustering analysis (Fig. 3A), we highlighted interactions that were observed directly as bait-prey pairs (yellow) from those that were solely inferred on the basis of co-purification as preys in the same  experiments (blue). Strikingly the physical composition of the ribosome is primarily inferred from indirect (prey-prey) interactions. Mainly due to the purification protocols used, neither the Krogan et al. (7) nor Gavin et al. (6) studies successfully purified tagged subunits of the ribosome, although both works often obtained ribosomal proteins as preys. Krogan et al. (7) filtered these promiscuous proteins from their dataset, and although Gavin et al. (6) retained the ribosomal protein data, it resulted in the inference of many complexes containing various subsets of the ribosome. Instead in this unbiased representation, the ribosome remarkably appears as a single complex along the diagonal, largely free of nonspecific offdiagonal connections.
As a further demonstration that hierarchical clustering of the consolidated data is potentially more informative than the lists of complexes presented in the original studies, we used a different two-color scheme (yellow and red) to highlight interactions that were not present in either the inferred protein complexes from Gavin et al. (6) or Krogan et al. (7) (Fig. 4, A-D). These new interactions may have been identified due to the improved scoring system, the simultaneous consideration of both raw datasets, or a combination of these factors. Consistent with the trends observed in the ROC curves (Fig.  1A), a number of previously characterized PPIs were only seen with the new analyses. For example, six subunits of the transcriptional elongation complex Elongator have been characterized previously (42)(43)(44), but only in this new representation was the smallest subunit, Elp6, actually confirmed (Fig.  4A). Similarly in our new merged PPI dataset it is clear that Sec20 is a component of the Dsl1 complex, required for stability of the Q/t-SNARE complex at the endoplasmic reticulum (45) (Fig. 4B), and that Dad2 and Dad3 are components of the DASH microtubule ring complex (46) (Fig. 4C).
An example of a weak association between two distinct sets of proteins revealed by the hierarchical clustering is represented in Fig. 4D. The MIND (Mtw1 including Nnf1-Nsl1-Dsn1) complex (47) is seemingly associated with the kinetochore complex through one of it subunits, Ame1. A relatively weak association also exists between several subunits of the inner and outer kinetochore (Ame1, Mcm22, Okp1, Chl4, and Nkp2) (48) and an uncharacterized protein, Yol054w, a connection not present in the complexes derived in the Krogan et al. (7) and Gavin et al. (6) studies (Fig. 4D). Consistent with this hypothesis, deletion of YOL054w results in genomic instability as measured by bimater (20,49) and "a-like faker" (21) assays (Fig. 4, E and F). DISCUSSION With the two largest high throughput studies of proteinprotein interactions in yeast (or any other organism) recently completed, two questions arise: how completely have interactions been identified, and how accurately have they been determined? With respect to coverage, a rough calculation based on the degree of overlap between the two recent studies suggests that they cover ϳ80% of interactions accessible to the TAP approach under the conditions used.
In terms of accuracy, we demonstrate here that high throughput identification of protein-protein interactions has reached a new landmark. For the first time, this consolidated dataset can match the reliability of small scale experiments. By simultaneously analyzing the two recent studies with one scoring system and creating a single merged dataset, we were able to generate a large set of PPIs ordered according to a score that indicates the strength of experimental evidence supporting their validity. In particular, we were able to identify a large subset of ϳ9000 of these interactions, which by several independent metrics appear to be of equal or greater accuracy than that attained in a collection of small scale experiments. More valuable than these high accuracy binary interactions, however, may be the portrait of the yeast physical interactome that emerges from them through hierarchical clustering. The weak but reproducible interactions that appear between well defined complexes or between the individual components within these complexes and other proteins can be used to generate a number of hypotheses for future research.
Although identification of stable protein complexes that survive TAP purification may be nearing saturation for S. cerevisiae, much work remains in characterizing PPIs. For example, because a precise estimate of the false positive rates for the PPI datasets presented here remains elusive, a systematic reanalysis of a subset of these putative interactions using small scale methods may be very valuable. Also further identification of transient associations between well defined complexes, perhaps by further exploiting the yeast two-hybrid system, will prove insightful. An understanding of the dynamics of protein-protein interactions in response to changes in the environment has yet to be systematically explored. Obtaining low resolution structural analyses of the defined complexes using electron microscopy and determining which protein post-transcriptional modifications are involved in mediating PPIs are also of immediate interest. Furthermore efforts should be made to more quantitatively characterize protein-protein interactions perhaps by using technologies amenable to detecting PPIs in vivo. Finally considering that such significant biological information was extracted from yeast using this approach, a similar comprehensive strategy for defining the physical interactome in more complex organisms must be endeavored.