Popular Computational Methods to Assess Multiprotein Complexes Derived From Label-Free Affinity Purification and Mass Spectrometry (AP-MS) Experiments*

Advances in sensitivity, resolution, mass accuracy, and throughput have considerably increased the number of protein identifications made via mass spectrometry. Despite these advances, state-of-the-art experimental methods for the study of protein-protein interactions yield more candidate interactions than may be expected biologically owing to biases and limitations in the experimental methodology. In silico methods, which distinguish between true and false interactions, have been developed and applied successfully to reduce the number of false positive results yielded by physical interaction assays. Such methods may be grouped according to: (1) the type of data used: methods based on experiment-specific measurements (e.g., spectral counts or identification scores) versus methods that extract knowledge encoded in external annotations (e.g., public interaction and functional categorisation databases); (2) the type of algorithm applied: the statistical description and estimation of physical protein properties versus predictive supervised machine learning or text-mining algorithms; (3) the type of protein relation evaluated: direct (binary) interaction of two proteins in a cocomplex versus probability of any functional relationship between two proteins (e.g., co-occurrence in a pathway, sub cellular compartment); and (4) initial motivation: elucidation of experimental data by evaluation versus prediction of novel protein-protein interaction, to be experimentally validated a posteriori. This work reviews several popular computational scoring methods and software platforms for protein-protein interactions evaluation according to their methodology, comparative strengths and weaknesses, data representation, accessibility, and availability. The scoring methods and platforms described include: CompPASS, SAINT, Decontaminator, MINT, IntAct, STRING, and FunCoup. References to related work are provided throughout in order to provide a concise but thorough introduction to a rapidly growing interdisciplinary field of investigation.

detected by isolation of complexes from their biological source, but instead originate from studies using the yeast two-hybrid (Y2H) system (supplemental Table S1). Despite a growing trend in deposition of large protein interaction data sets from isolation studies and components identified by proteomics approaches, as described later in this publication (5,6); Y2H studies have yielded a rich source of protein interaction data (supplemental Tables S1, S2) over the past three decades. Y2H determines physical interaction between two proteins (binary interactions) by means of transcriptional activity (7). The method employs yeast transcription factors, such as Gal4, with activating (AD) and DNA binding (BD) domains. The AD and BD domains can be fused respectively to a variety of prospective interacting partners, commonly referred to as baits and preys. If bait and prey interact, the two domains are brought together to form a functional transcription factor that indicates protein interaction via expression of a reporter gene. Developments of the approach include attempts to improve specificity, for example via use of cytokine receptor activation for greater physiological relevance (8). Regardless of the specific approach taken, libraries of baits and preys can be genetically engineered to test numerous combinations in large-scale studies. Despite the advantages of obtaining large data sets (9 -11), the false positive rate (FPR) associated with this approach has been reported in the past to be greater than 50% in some instances (12)(13)(14)(15). Recent studies have reduced error rates, in particular via the combination of additional methodologies such as LUMIER (16), MAPPIT (17), protein complementation assay (PCA) (18), and NAPPA (19), as shown by Vidal and coworkers (20). False positives (FP) arise from differences in genetic background, differences in quantitative measures of interaction, use of nonphysiological conditions, or extensive testing of protein partners that do not normally exist in the same cellular compartment (12,15). Finally, although Y2H largely tests direct interactions between bait and prey, there may also be occasions in which interaction is mediated by a third yeast protein and thus bait and prey interact indirectly. For further reviews focused on the minimization of FP associated with direct binary protein interaction data (primarily Y2H), the reader is directed to the works of Vidal, Uetz, and Vazquez (21)(22)(23).
Protein microarrays also enable the identification of (binary) protein-protein interactions systematically on a genome scale. Bait proteins are individually expressed in yeast strains and subsequently purified. The purified proteins are arrayed on a slide and subsequently incubated with prey proteins labeled in some manner, such as tagging with a fluorescent marker, to monitor their interaction. Protein arrays have also been used to investigate interactions between proteins and small-molecules, kinase-substrate interactions, or antibodymediated protein profiling. Observed limitations of this method concern aberrant protein folding and nonphysiologically relevant post-translational modification (PTM) status of the yeast-derived bait (24 -26). In addition, there exists a plethora of biophysical experimental methods with which to investigate PPIs, including structural studies of protein complexes using x-ray crystallography and NMR, surface plasmon resonance (SPR), fluorescence resonance energy transfer (FRET), and isothermal titration calorimetry (ITC). Many of these approaches are reviewed in (27,28).
Affinity purification coupled to mass spectrometry (AP-MS) represents a more recent approach that involves tagging bait proteins with an epitope before purification of both bait and interacting partners (prey) using antibody affinity. Proteins of the retrieved complex are digested into peptides, which are then subject to liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS) for identification. Commonly used tags are GFP, 6xHis, Myc, StrepII, and FLAG. Tandem affinity purification (TAP) (29, 30) is a popular variant, wherein baits are engineered to contain two or more tags that permit isolation of protein complexes via sequential two-step purification in native conditions before identification. Sequential purification reduces contaminating proteins, but often results in poor yields. Interactomes by parallel affinity capture (iPAC) (31) aims to overcome sequential losses in yield by parallel purification of protein complexes interacting with multiply tagged baits. Eluted complexes resulting from different affinity matrices are analyzed separately by LC-MS/MS. Contaminants are identified by processing a tagless negative control, and matrix-specific contamination is subtracted in silico.
To overcome the limitations of using tagged bait proteins, two recent studies describe methods to isolate complexes under native conditions without tagging or overexpression of the bait (6,50). Both approaches use biochemical separation of complexes based on various physical properties of the complex, such as size, isoelectric point, and density, to enrich for complexes that are then analyzed by mass spectrometric methods. Co-eluting proteins are then taken to be part of the same complex. These less invasive approaches have potential to improve biological relevance, as cellular mechanisms are less perturbed by the experimental process, but at the price of an inherent bias toward detection of abundant and stable protein complexes (51). Technical aspects of the experimental pipeline may also yield false positive interactions. For example, cell lysis may cause proteins from different subcellular locations to interact in a nonphysiologically-relevant manner. Furthermore, the presence of a single bait in different complexes may result in their copurification, and lead to erroneous consideration of all possible relationships between the prey proteins involved as if they were part of the same complex. Quantitative proteomics approaches have been used to identify the resulting false interactions (52), via comparison between the abundance of interacting partners purified in association with a bait and in a control sample (Selbach and Mann (53)). The quantitation methods applied include SILAC (54), iTRAQ (55), ICPL (56), and ICAT (57), all of which use differential stable isotope labeling either in vivo or in vitro. Label-free quantitative methods have also been applied, with spectral counting, wherein the number of spectra acquired and assigned to peptides from the same protein is taken as a measure of abundance, being one of the most popular (58). The reader is referred to (59) for an in-depth review of the strengths and drawbacks of SILAC, ITRAQ, and ICAT for AP-MS data evaluation.
The process of protein identification within mass spectrometry-based methods may also yield false-positive interactions. Many search algorithms are available with which to identify proteins from LC-MS/MS data, including MASCOT (60), SE-QUEST (61), and X!Tandem (62), each with advantages and disadvantages (63) to the extent that combination of different algorithms has been used to lower the rate of erroneous identification (64 -68). By contrast, false negatives may occur because of loss of complex members during the purification protocol and also failure to identify low abundance proteins, although various cross-linking methods are available to capture low-affinity interactions (69,70). Given multiple sources of erroneous protein-protein association, conventional assessment of AP-MS data set quality involves comparing observed prey proteins with bait interacting proteins via public databases. Prey proteins, observed experimentally to interact with the bait, are considered true positives (TP) if the same interaction has also been reported in public databases. Preys identified experimentally but previously annotated as not interacting with the bait are considered false positives (FP). Negatome (71) provides a repository for such negative examples. Prey proteins identified by AP-MS but which are neither published interactors nor previously reported as noninteracting proteins are treated as potential novel interactions (46).  (80)), collect interaction data via direct submission, expert curation, or computational text-mining of protein interaction data from publications under curator supervision. In addition to the limitations associated with automated text-mining (81,82), one report suggests that any two databases fully agree on only 42% of interaction data and 62% of proteins curated from the same publication (83). It is demonstrated, however, that such discrepancies are due predominantly to different curation methods, e.g. focus on specific interactions of interest, or mapping mixed-species interactions into a single model organism (84 -86). Accordingly, the International Molecular Exchange Consortium (IMEx) has developed a set of curation rules to facilitate both consistency and quality, and now provides a nonredundant set of protein interaction data annotated to a uniform standard (87). Also in support of curation and data comparability, the Proteomics Standards Initiative (PSI) has developed a controlled vocabulary and file format for molecular interactions (PSI-MI) (88), which may be evaluated by any of a selected list of methods (PSISCORE) via the PSI common query interface (PSICQUIC) (89) (for a proof of PSICQUIC use see supplementary Table S3 and Table S4 and supplementary Fig. S1). IRefWeb (90) and IRefIndex (4) are systems allowing the merging and interrogation of protein interactions from more than 10 public databases.
In response to unacceptably high FPR in empirical data sets, computational evaluation methods have emerged over the last decade that may be used to assist quality assessment of protein-protein interactions. Such in silico evaluation methods explore different information sources, ranging from experimental output to public knowledge-bases, with different algorithmic approaches, including statistical over-representation, probability distribution modeling of varying complexity, supervised machine learning, and text-mining. Some methods act directly upon AP-MS experimental output to filter false positive interactions, whereas others have the wider aim of predicting protein interactions de novo, but may equally be applied as filters on experimental output. Accordingly, the broad spectrum of in silico filtering methods may be characterized by their input data (1), modeling and decision methods (2), type of protein relation (3), and motivation (4). These four distinctions can be used to separate techniques along broad lines and with the specific context of identifying biologically relevant protein interactions in AP-MS protein association data. Here, we review the computational evaluation of protein interactions via several contemporary methods, which mark paradigms from platform-based univariate assessment (PPI prediction based on one variable or parameter), through weighted combinations of multiple scores derived from related annotation and wider experimental sources, toward currently evolving methods that make use of adaptively-weighted, predictive ensembles, derived from all three information sources (platform, annotation and wider data; Fig. 1).
Platform-Based Methods-A plethora of experimental platform-based scoring methods for AP-MS data exemplify the first of the three paradigms introduced above. Platform-based assessments are specifically motivated by FP removal from data derived via AP-MS-based studies (distinction 4, defined earlier), and filter direct interactions (3) via computational assessment of platform output (1). They differ in the statistical methods employed for PPI appraisal (2), especially in the use or otherwise of a subset of "true " and "false " interactions to guide decision thresholds placed on their output.
The Comparative Proteomics Analysis Software Suite, CompPASS (91) was developed to identify high confidence candidate interacting proteins (HCIPs) in AP-MS experiments via mass spectrometry spectral counts (91), and has been applied to human deubiquitinating enzymes (DUB) (91), the human autophagy interaction network (AIN) (92), and the human interaction network for ER-associated degradation (INfERAD) (93). CompPASS employs univariate descriptive statistics to assess observed interactions according to functions of the number of baits evaluated and the number of both purifications and replicate runs in which the prey protein is detected. The score is assessed by comparison against randomized data, does not employ control experiments, and is designed for nonreciprocal purifications, giving lower scores to proteins found in multiple experiments. The suite is available online and, if queried with a protein not present in the evaluated experiments, displays results for similar proteins via protein name synonyms. Dependence of the algorithm upon spectral counts may cause it to be adversely affected by inefficiencies in protein digestion, peptide ionization, and performance within an MS/MS experiment. These factors have particular influence on the quantitation of low abundance proteins (94 -97), highlighting a potential weakness of experiment-specific methods.
The Significance Analysis of INTeractome (SAINT) system (98) was first introduced to filter kinase-phosphatase interaction data (KPI; 1844 interactions between 887 protein partners) (99) and has been applied subsequently to smaller data sets (98). Like CompPASS, SAINT evaluates the probability of interaction between proteins as a function of spectral counts. The latest version assumes separate Poisson spectral count distributions for bait and prey: one to represent true interactions (test experiments) and another for false interactions (negative control purification experiments). In the absence of a control data set (known false interactions), preys are divided into high-and low-frequency groups according to a userspecified threshold. The final score reflects the probability of observed spectral count belonging to the true interaction distribution, compared with that of false interactions. For optimal evaluation, the authors suggest 15 to 20 different bait purifications (supplementary Data (99)). In comparison with CompPASS, SAINT has displayed increased overlap with published data (98). It relies, however, upon the use of robust controls, good overlap of spectral measurements over successive LC-MS/MS runs, and the assumption that preys in control samples are characterized by lower quantitation values than those in experimental samples. In one published study, for example, peptide lists from two technical replicates were observed to overlap by 35-60% largely because of the stochastic nature of LC-MS/MS selection of peptide ions for fragmentation in successive runs (97).
It is noteworthy that, although most applications of Comp-PASS and SAINT have involved spectral counting, both approaches can be modified to take other types of quantitation data. A recent development, SAINT-MS1 (100), models MS1 intensity data to filter AP-MS PPI data in similar manner. For an in-depth review of the SAINT see Nesvizhskii (101). Alternatives to both methods are provided in the form of in-house analytical pipelines, some recent examples of which are outlined below.
Guruharsha and coworkers (102) evaluate the confidence of ϳ3500 D. melanogaster affinity purifications to extract 556 protein complexes in several stages. In the first instance, spectral counts are modeled using the hypergeometric probability distribution as in Hart et al. (103). Using a randomized set for FDR computation, a 0.05% threshold reduces the set to 10,969 binary interactions among 1297 proteins before identification of complexes via a Markov clustering algorithm (104).

Computational Assessment of Protein Interaction Data
DAVID Functional Annotation Tools (105), KEGG (106), and modEncode RNA-Seq data (107) provide additional functional analyses. The hypergeometric distribution score shows improved performance when compared with both Hart et al. (103) and the socio-affinity index (SAI) (32), two methods based on frequency of observation of specific bait-prey combinations, and also when compared with CompPASS and SAINT, methods based on spectral counts. The analysis pipeline delivers scores expressing confidence in protein identification, protein complex co-membership, and protein co-expression.
A further recent study, of membrane-protein complexes in S. cerevisae, uses the "purification enrichment" (PE) score (108) based on observation of two pairs of proteins (bait and prey) in the same and in separate purifications, followed by application of Markov clustering to a combined set of published and newly detected complexes to deliver over 500 putative protein complexes (5). Similarly, Havugimana et al. (6) recently applied Pearson correlation analysis to evaluate spectral counts in combination with the analysis of published protein networks to integrate new experimental data and build a core set of human protein complexes.
Decontaminator, removes contaminants from AP-MS experiments by modeling Mascot protein identification scores (109), as demonstrated by application to a set of 14 "induced" and 14 "noninduced" TAP-tagged RPAP3 complexes in HEK 293 cells (purification protocol (110)). Between five and 516 preys, with a mean of 135, were detected per bait in the induced samples, whereas between 206 and 626 preys, with a mean of 316 proteins, were detected in the noninduced samples. Noise models are constructed on MASCOT scores of noninduced samples, which feature higher mean and range of prey detection frequencies, and their induced counterparts. The resulting posterior probabilities of true PPIs are used subsequently to compute significance of each potential prey interaction. Decontaminator has shown improved performance in comparison to five other methods: two variations of SAINT, and three descriptive statistics, on the evaluation of a combined BioGRID (78) and HPRD (73) data set (109) and may be applied even when few control experiments are available. It should be noted, however, that strong conclusions are elusive when comparing algorithms designed for use with large unconnected networks, such as SAINT, with those designed for application to smaller data collections, such as Decontaminator. The provision of an alternative to conventional subtraction of preys identified in a control sample from those of an experimental sample is welcomed; but the method remains dependent upon both biological sample (i.e. protein purification) and the analytical system employed (i.e. mass spectrometric set-up and protein detection levels). In addition, the universal assumption that FPs are defined by lower Mascot scores than those of true interactors may not always generalize sufficiently in the presence of inconsistencies in peptide detection and mass-spectra-peptide-protein matching between runs (111,112).
Finally, two recent methods for platform-based filtering of false positive interactions warrant introduction. Quantitative BAC InteraCtomics (QUBIC) uses either SILAC or "label-free" quantitation in MaxQuant (113) to differentiate between false positives, background binding proteins and bait interaction partners (114), and Mass spectrometry interaction statistics (MiST) combines protein abundance (peak intensity), reproducibility, and specificity (observance of a prey uniquely for one bait) by principal component analysis (115) in an approach that proved competitive against both CompPASS and SAINT when applied to AP-MS data from an experiment describing host-pathogen interactions (116).
Databases and Annotation Confidence Scoring Systems-It is clear that the experiment or platform-specific direct assessment of binding assay output limits the amount of data available from which to derive interaction statistics of sufficient power to discriminate fully between true and false interactions.
Several computational methods aim to provide more generic tools that allow scoring and inference of interactions beyond those putatively identified by single AP-MS experiments. The following recent, well-known, PPI prediction systems all derive a series of single scores from public data sources (distinction i, Introduction) and combine them to assess putative interaction (4) of varying degrees of association (3). The methods described progress from fixed to adaptive weighting when combining PPI scores, and from data description to use of "high-confidence" data subsets to inform decision thresholds placed on them (2).
The Molecular Interaction (MINT) repository evaluates stored protein interactions according to experimental information quality (117). The evaluation score ranges between 0 and 1 to describe the strength of association between two proteins, and is a weighted combination of: experiment size, experiment type, published orthologue interactions, and number of supporting publications, via an exponential function with initial slope provided by an empirically set exponent. Experiments with less than 50 interactions receive an experiment size weight of 1, larger experiments receive 0.5. Experiment type is weighted 1 for experiments supporting a direct interaction, or 0.5 for affinity purifications. Ortholog scores extracted from InParanoid (118), a tool to extract orthologs from multiple organisms using a BLAST-based algorithm, are multiplied to yield a single ortholog weight ranging between 0 and 1. Interactions with more than five supporting publications obtain a weight greater than or equal to 1. The latest version of MINT introduces a "community recognition" weight (75) via a combination of experiment type and ratio between citations and reported interactions in each paper. The improved score rewards focused interaction studies (e.g. Y2H) in highly cited publications. Ease of implementation and functional clarity enable application of this approach independent of experimental method, and is particularly useful in the fast filtering of large data sets. The method is not sufficiently sensitive for detailed investigation of a smaller set of proteins, however, and the fixed a priori thresholds on publication count, experiment size, and experiment type may require revision to prevent large-scale experiments (e.g. AP-MS) being disadvantaged within this scoring system.
The Interaction database (IntAct) (74) employs a simple system to evaluate constituent experimentally observed PPIs. All IntAct experiments are clustered (manuscript in preparation (74)) to produce a nonredundant protein subset (i.e., a set containing only one entry for each protein) and the confidence of each protein-protein pair is scored via the sum of fixed weights relating to detection method, interaction type, and number of database publications in which the interaction has been observed. For example, weights for the detection method are derived by expert curators from the PSI-MI controlled vocabulary, collapsed to top-level terms in the vocabulary hierarchy, and range from 0.2 (colocalization) to 3 (biochemical). The combined score is normalized to scale between 0 and 1. The algorithm is designed to work on any PSI-formatted file annotated to MIMIx standards (119), and future directions include the incorporation of further properties from PSI-MI controlled vocabularies. A precursor of such methods, the Database of Interacting Proteins (DIP) (76), used three scoring systems based on expression profiles (12), paralogs (12), and domain-domain interactions (120) respectively to evaluate binary physical PPI in experimentally derived collections.
The Search Tool for Retrieval of Interacting Genes (STRING) (121) employs multiple scoring algorithms and annotation sources to retrieve and evaluate possible protein pairs and reject pairs deemed not to interact according to lack of datadriven evidence. Eight distinct scores are incorporated to assess interaction partners for every protein stored in the accompanying database: neighborhood, (gene) fusion, (phylogenetic) co-occurrence, homology, coexpression, experiments, databases, and textmining. Some are novel to STRING itself, whereas others already exist as stand-alone scoring algorithms (122,123).
For example, the Homology score (141) employs external ortholog assignments stored in Clusters of Orthologous Groups (COGs), with interaction confidence score between two proteins from different COGs assumed to be either partially or wholly (depending on mode of use) transferable to any pair of proteins belonging to the same two COGs. Co-expression (141) combines correlation of gene expression across multiple transcriptional profiles, retrieved via ArrayProspector (142), with the context of known interaction pathways (KEGG) to assess putative PPIs for consistency, accompanied by familiar advantages and disadvantages of using microarray data to infer interactions (143,144) and transcription to approximate resulting gene or protein activity. The Experiments score (141) uses protein interactions imported from databases-some described above-including MINT (75), BIND (77), DIP (76), BioGrid (78), IntAct (74), and PDB (145). The imported experiments are evaluated based on the size of the experiment: small-scale experiments get a fixed empirical score, whereas protein complexes obtained from large-scale studies are evaluated as the log ratio of the number of studies supporting the interaction and the number of studies supporting the contrary (141). Database evidence (126, 141) uses external published pathway information from databases including KEGG (106), Reactome (146), and BioCarta 2 , and derives from the frequency with which each protein pair appears in the same pathway. Finally, the Textmining score (126,147) searches abstracts from SGD (148), OMIM (149), The Interactive Fly, and PubMed for published regulatory gene interactions, parsing for co-occurring gene names and interaction information using Natural Language Processing tools (123). Ongoing improvements are supported by community initiatives such as the Critical Assessment of Information Extraction systems in Biology (BioCreAtIvE) 3 .
Each individual subscore described above is benchmarked against a common TP reference set, KEGG (106) pathway membership, and represents on its own the confidence of a true interaction. A naïve Bayesian combination of these subscores under the assumption of independence (141) is employed subsequently to produce a single confidence score scaled to range between 0 and 1 (141). STRING thereby combines multiple annotation sources evaluated via a range of computational methods into one composite interaction score.
In relation to AP-MS experimental data, putative interactions may be filtered via intersection with a list of proteins suggested by STRING to interact with the same bait protein, taking composite scores and scoring methods into consideration upon interpretation of the result. A study in Mycoplasma pneumoniae bacteria used STRING in conjunction with an experimental scoring scheme resulting in some lower experimental scores being elevated and as such used this approach to validate lower scoring empirical interactions using orthogonal evidence (45). A potential weakness of the STRING framework is that it is gene-orientated and, therefore, does not acknowledge specific protein isoforms or posttranslational modifications. Accordingly, a measure of care should be taken during interpretation, especially when applied to filter the output of AP-MS experiments. In mitigation, the composite interaction score reflects individual subscores in a transparent fashion, which greatly assists interpretation, and the combination of subscores across varied approaches and biological information lends robustness beyond that of each individual constituent.
In the above sections, the scoring systems described for potential application to filter PPIs suggested by AP-MS experiments have progressed from descriptive statistics on individual data sets to predictive statistics based on wider experimental data, moving toward conjoint analysis of multiple variables or parameters and the derivation of decision thresholds from wider annotation data.
Composite Methods-Recent techniques to evaluate true physiological interactions tend toward the adaptively weighted combination of multiple PPI assessments "learned" from a variety of information sources. A good example is provided by the Functional Coupling database-FunCoup (150), which constructs global functional coupling networks for the most-researched eukaryotes based on proteomic and genomic data. The input sources are listed and given brief descriptions below, with references to related works: Co-expression (MEX) is evaluated from normalized microarray data by Pearson and Spearman rank correlations. Phylogenetic signatures (PHP) based on the major clades (Fungi, Plantae, Animalia) are employed to calculate log-likelihood ratios pertaining to shared phylogeny. Protein-protein interaction data (PPI) are evaluated based on number of papers, experiment type and experiment size. Data linked to one paper, one experiment and multiple preys are penalized, whereas data with multiple papers and multiple experiments are favored. GO terms relating to SCL are weighted based on their mutual information, an information theory quantifier (159) of the codependence of each GO term in the evaluated GO term pair, and number of proteins assigned to each term. The PEX score evaluates Human Protein Atlas expression and subcellular localization data using the same mutual information quantification principle as for SCL. TFB and MIR compute scores directly proportional to the ratio of binding sites shared by two proteins over all binding sites associated with the pair. The UniDomInt descriptor incorporates nine domain interaction databases (158) and is incorporated directly. Likewise, GIN scores are incorporated from genetic interaction profiles suggested by Costanzo et al. (157).
FunCoup delivers a discrete evaluation of physical interaction, protein complex membership, and metabolic or signaling link. To map combined, un-normalized metric scores to these four classes, FunCoup employs four corresponding training sets built from data retrieved from IntAct, HPRD, BIND, KEGG, and UniProt. The four sets correspond to: physical interaction-proteins reported in one PPI experiment and belonging to the same KEGG pathway, or reported in multiple PPI experiments; protein complex-members of the same UniProt complex; metabolic link-members of either a small KEGG metabolic pathway, or partners in a specified minimum number of (organism specific) metabolic pathways; signaling link-similar to metabolic link, but for signaling pathways. Corresponding negative training sets and the pitfalls associated with them (160) are avoided by use of Bayesian methods to implicitly define a background evidence probability (with background defined as sets of protein pairs less likely to interact when compared the 'gold standard' sets of true interacting partners). An interaction model is created from each training set and outputs are combined using a naïve Bayesian network to deliver a single confidence value that ranges between 0 and 1 (161).
FunCoup scores have displayed Spearman rank correlation of 0.19 when compared with STRING, with relatively low positive correlation attributed to difference in methodologies and input data (161). A positive correlation is expected, as both systems use external annotation to infer PPIs, but it should be noted that Spearman rank correlation is sensitive to changes in rank, rather than individual score, which gives small score changes (e.g. from 0.88 to 0.89) a larger influence on correlation than merited by biological context.
Van Haagen et al. (162) combine text mining (Peregrine (163)), Gene Ontology (GO) over-representation analysis, microarray data (COXPRESdb (164)), tissue specific gene expression data (TiGER) (165), and domain-domain interaction (DOMINE) (166) information into a single PPI prediction system and demonstrate their approach by inferring potential interaction partners for dysferlin and huntingtin (162). MEDLINE abstracts are processed via text-mining with a dictionary of gene name synonyms, spelling variation of concepts, and protein-specific context profiles (128). Microarray data (COXPRESdb (129)) are evaluated using Pearson correlation coefficients to quantify the co-expression of each gene pair. The DDI score appraises binary PPIs via number of interacting domains according to DOMINE. Information content of the lowest common GO term ancestor (LCA) is employed as a similarity measure between two proteins. Fisher's method is used to combine the individual database scores (127). A reported increase in performance over STRING (127) may be caused by the latter method's bias toward reporting all potentially functionally linked proteins, plus differences in annotation sources and text-mining procedures. The systems differ also in their treatment of microarray data, as STRING uses ArrayProspector whereas van Haagen uses COX-PRESdb and TiGER to assess co-expression across tissuespecific expression data. A potential improvement might be obtained by taking into consideration the different evidence codes that indicate source of information for each GO term assignment.
Further online resources that combine data types and eukaryotic gene networks include GeneMANIA (167), which covers A. thaliana, C. elegans, D. melanogaster, M. musculus, H. sapiens and S. cerevisiae, and bioPIXIE, which builds S. cerevisiae networks based on gene expression, genetic and physical interactions, localization annotation, and literature curation (168). Alternatively, Kiel and coworkers describe a new conceptual analysis pipeline that combines experimental proteomic data, literature mining, computational analyses, and structural information to generate a multiscale signal transduction network based on tissue-specific gene expression and domain-domain interaction data for the study of rhodopsin and its interactions (169).
The systems outlined in this section provide examples of contemporary PPI inference methods to which the filtering of false-positive (and identification of false negative) interactions from AP-MS experiments lends a dual purpose. This dual purpose also frames the present state of the field, because it appears that successful computational approaches to the postreduction of FPR in AP-MS experiments may require a balance to be struck between the specificity of direct analysis on experimental output and the strong phenotypic relationships inferred from abundant annotation and large databases of generic curated PPIs.
Summary and Future Prospects-The approaches described in this review may all be employed to assist the interpretation of empirical AP-MS-derived protein interaction data, by filtering false-positive interactions in particular. The methods presented differ in motivation and consideration of what constitutes interaction, ranging from acute assessment of direct interactions from single experimental output to generic PPI inference from proteome-wide annotation (see Table I). The approaches also differ in their approach to evaluation and decision-making, from single descriptive statistics to distinctions between exemplar PPIs and noninteracting counterparts learned across multiple descriptors. Each approach has strengths and weaknesses and arguably there exists considerable scope to refine current methodologies.
In the immediate term, increasing generation and expansion of published interactomes is likely to encourage the evaluation of multiple data sets to reveal system-wide trends. Regardless of the specific approach taken, weighted combinations of multiple PPI scores, and the decision thresholds placed upon them, provoke new considerations regarding the selection and representation of training data that exemplifies the distinction sought between proteins that interact and those that do not. In this context, avenues of potential improvement include: further development of combined scoring systems to provide multisource data representations for PPI inference via supervised machine learning or modeling techniques based on multiple sources of information; the coanalysis of readily available annotation data to augment a limited quantity of more specific experimental data obtained via platform or experiment-specific methods (data fusion); and the application of training data design strategies from other disciplines, such as high-throughput molecular screening (170), to guide the production of more appropriate exemplar data collections from which to discriminate between true and false PPIs.
The present trend of combining multiple information sources in pursuit of protein interaction filters that are (a) specific to experimental output, and also (b) reliable and comparable across multiple experiments and platforms, comprises a genuinely interdisciplinary endeavor. The required combination of expertise includes computer scientists and bioinformaticians, to design and build multisource computational filters, working alongside developers of the experimental methods employed and the expert knowledge of curators, biologists and experimentalists.
In conclusion, the importance of determining sets of interacting proteins and how these interactions change in response to a highly dynamic cellular environment cannot be underestimated. The field has impact on many fundamental areas of cell biology research and beyond, including the selection of drug targets and the elucidation of therapeutic mechanisms of action, which play an increasingly important role in human health. Continuing platform development may alleviate the unacceptably high FPRs that presently impede the unfiltered use of common empirical methods, but sources of noise and unwanted variance are inherent to the underpinning technologies and are likely to perpetuate a need for computational postanalysis of experimental output. Accord-

Computational Assessment of Protein Interaction Data
ingly, the development and implementation of sophisticated computational tools to assist in removal of erroneous interactions is, and will continue to be, the lynchpin of such technologies.