Skip to main content
Log in

Predicting Protein-Protein Interactions by Association Mining

  • Published:
Information Systems Frontiers Aims and scope Submit manuscript

Abstract

Identifying protein-protein interactions is a key problem in molecular biology. Currently, interactions cannot be reliably predicted on a proteome-wide scale but direct and indirect evidence for interactions is increasingly available from high-throughput interaction detection methods, gene expression microarrays, and protein annotation projects. In this paper we propose an association mining approach to integrating these diverse types of evidence. We apply this approach to a number of datasets consisting of interacting and non-interacting protein pairs annotated with different types of evidence. We identify patterns that distinguish interacting and non-interacting protein pairs, and use these patterns to assign a confidence level to proposed interactions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Lu L, Arakaki AK, Lu H, Skolnick J. Multimeric threading-based prediction of protein–protein interactions on a genomic scale: Application to the Saccharomyces Cerevisiae Proteome. Genome Research 2003;13:1146–1154.

    Google Scholar 

  • Edwards AM, Kus B, Jansen R, Greenbaum D, Greenblatt J, Gerstein M. Bridging structural biology and genomics: Assessing protein interaction data with known complexes. Trends Genet. 2002;18(10):529–536.

    Article  Google Scholar 

  • Deane CM, Salwinski L, Xenarios I, Eisenberg D. Protein interactions: Two methods for assessment of the reliability of high throughput observations. Molecular & Cellular Proteomics 1.5 2002;1(5):349–356.

    Google Scholar 

  • von Mering C, Krause R, Snel B, Cornell M, Oliver SG. Fields S, Bork P. Comparative assessment of large-scale data sets of protein—protein interactions. Nature 2002;417:399–403.

    Article  Google Scholar 

  • Peri S, Navarro JD, Amanchy R, Kristiansen TZ Jonnalagadda CK, Surendraneth V, et al. Development of human protein reference database as an initial platform for approaching systems biology in humans. Genome Research 2003;13:2363–2371.

    Google Scholar 

  • Grigoriev A. On the number of protein-protein interactions in the yeast proteome. Nucleic Acids Research 2003;31(14):4157–4161.

    Article  Google Scholar 

  • Uetz P, Goit L, Cagney G, Mansfield TA, Judson RS, Knight JR, et al. A comprehensive analysis of protein—protein interactions in Saccharomyces cerevisiae. Nature 2000;403:623–627.

    Google Scholar 

  • Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M, Sakaki Y. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc Natl Acad Sci USA 2001;98:4569–4574.

    Article  Google Scholar 

  • Gavin AC, Bosche M, Krause R, Grandi P, Marzioch M, Bauer A, et al. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 2002;415:141–147.

    Article  Google Scholar 

  • Ho Y, Gruhler A, Heilbut A, Bader GD, Moore L, Adams SL, et al. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 2002;415:180–183.

    Article  Google Scholar 

  • Tong AH, Evangelista M, Parsons AB, Xu H, Bader GD, Page N, et al. Systematic genetic analysis with ordered arrays of yeast deletion mutants. Science 2001;294:2364–2368.

    Article  Google Scholar 

  • Deng M, Sun F, Chen T. Assessment of the reliability of protein-protein interactions and protein function prediction. Pac Symp Biocomput. 2003:140–151.

  • Valencia A, Pazos F. Computational methods for the prediction of protein interactions. Current Opinion in Structural Biology 2002;12:368–373.

    Article  Google Scholar 

  • Sprinzak E, Sattath S, Margalit H. How reliable are experimental protein-protein interaction data? J. Mol. Biol. 2003;327:919–923.

    Article  Google Scholar 

  • Ng SK, Zhang Z, Tan SH, Lin K. Interdom: A database of putative interacting protein domains for validating predicted protein interactions and complexes. Nucleic Acids Res. 2003;31(1):251–254.

    Article  Google Scholar 

  • Goldberg DS, Roth FP. Assessing experimentally derived interactions in a small world. Proc Natl Acad Sci USA 2003;100(8):4372–4376.

    Article  Google Scholar 

  • Jansen R, Yu H, Greenbaum D, Kluger Y, Krogan NJ, Chung S, et al. A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science 2003;302(5644):449–453.

    Article  Google Scholar 

  • Zhang LV, Wong SL, King OD, Roth FP. Predicting co-complexed protein pairs using genomic and proteomic data integration. BMC Bioinformatics 2004;5(1):38.

    Google Scholar 

  • Agrawal R, Srikant R. Fast Algorithms for Mining Association Rules. Proc. 20 th Int. Conf. Very Large Data Bases VLDB 1994;487–499.

  • Doddi S, Marathe A, Ravi SS, Torney DC. Discovery of association rules in medical data. Med. Inform. Internet. Med. 2001;26(1):25–33.

    Google Scholar 

  • Stilou S, Bamidis PD, Maglaveras N, Pappas C. Mining association rules from clinical databases: An intelligent diagnostic process in healthcare. Medinfo. 10(Pt 2) 2001;1399–1403.

    Google Scholar 

  • Satou K, Shibayama G, Ono T, Yamamura Y, Furuichi E, Kuhara S, et al. Finding association rules on heterogeneous genome data. Proc. of the Pacific Symposium on Biocomputing 1997:397–408.

  • Creighton C, Hanash S. Mining gene expression databases for association rules. Bioinformatics 2003;19(1):79–86.

    Article  Google Scholar 

  • Oyama T, Kitano K, Satou K, Ito T. Extraction of knowledge on protein-protein interaction by association rule discovery. Bioinformatics 2002;18(5):705–714.

    Article  Google Scholar 

  • Liu B, Hsu W, Ma Y. Integrating Classification and Association Rule Mining. Knowledge Discovery and Data Mining 1998:80–86.

  • Li W, Han J, Pie J. CMAR: Accurate and efficient classification based on multiple class-association rules. Proceedings of the 2001 IEEE International Conference on Data Mining 2001:369–376.

  • Han J, Pei J, Yin Y. Mining frequent patterns without candidate generation. Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data 2000;29(2):1–12.

  • Alterovitz G, Xiang M, Kohane I, Ramoni M. Protein Network Topology Metric Conservation: From Yeast to Human. RECOMB 2005 Poster 2005.

  • Causton HC, Ren B, Koh SS, Harbison CT, Kanin E, Jennings EG, et al. Remodeling of yeast genome expression in response to environmental changes. Mol Biol Cell 2001;12(2):323–327.

    Google Scholar 

  • Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen MB, et al. Comprehensive identification of cell cycle-regulated genes of the yeast saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell 1998;9(12):3273–3297.

    Google Scholar 

  • Cho RJ, Campbell MJ, Winzeler EA, Steinmetz L, Conway A, Wodicka L, et al. A genome-wide transcriptional analysis of the mitotic cell cycle. Mol Cell 1998;2(1):65–73.

    Google Scholar 

  • Hughes TR, Marton MJ, Jones AR, Roberts CJ, Stoughton R, Armour CD, et al. Functional discovery via a compendium of expression profiles. Cell 2000;102(1):109–126.

    Article  Google Scholar 

  • Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, et al. Gene Ontology: Tool for the unification of biology. The Gene Ontology Consortium. Nature Genet. 2000;25:25–29.

    Google Scholar 

  • Su AI, Wiltshire T, Batalov S, Lapp H, Ching KA, Block D, et al. A gene atlas of the mouse and human protein-encoding transcriptomes. Proc Natl Acad Sci USA 2004;101(16):6062–6067.

    Article  Google Scholar 

  • Mewes HW, Frishman D, Guldener U, Mannhaupt G, Mayer K, Mokrejs M, et al. MIPS: A database for genomes and protein sequences. Nucleic Acids Res. 2000;30(1):31–34.

    Google Scholar 

  • Xenarios I, Fernandez E, Salwinski L, Duan XJ, Thompson MJ, Marcotte EM, Eisenberg D. DIP, the database of interacting proteins: A research tool for studying cellular networks of protein interactions. Nucleic Acids Res., 2002;30(1):303–305.

    Article  Google Scholar 

  • Breitkreutz BJ, Stark C, Tyers M. The GRID: The general repository for interaction datasets. Genome Biol. 2002;3(12), PREPRINT0013.

  • Kemmeren P, van Berkum NL, Vilo J, Bijma T, Donders R, Brazma A, et al. Protein interaction verification and functional annotation by integrated analysis of genome-scale data. Mol. Cell 2002;9:1133–1143.

    Article  Google Scholar 

  • Dwight SS, Harris MA, Dolinski K, Ball CA, Binkley G, Christie KR, et al. Saccharomyces Genome Database (SGD) provides secondary gene annotation using the Gene Ontology (GO). Nucleic Acids Res. 2002;30(1):69–72.

    Article  Google Scholar 

  • Bairoch A, Boeckman, B. The SWISS-PROT protein sequence data bank, recent developments. Nucleic Acids Res. 1993;21:3093–3096.

    Google Scholar 

  • Zadrozny B, Elkan C. Obtaining calibrated probability estimates from decision trees and naïve Bayesian classifiers. Proceedings of the Eighteenth International Conference on Machine Learning 2001:609–616.

  • Zadrozny B, Elkan C. Transforming classifier scores into accurate multiclass probability estimates. Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2002:694–699.

  • Wong SL, Zhang LV, Tong AH, Li Z, Goldberg DS, King OD, et al. Combining biological networks to predict genetic interactions. Proc Natl Acad Sci USA 2004;101(44):15682–15687.

    Article  Google Scholar 

  • Bock JR, Gough DA. Predicting protein-protein interactions from primary structure. Bioinformatics 2001;17:455–460.

    Article  Google Scholar 

Download references

Author information

Consortia

Rights and permissions

Reprints and permissions

About this article

Cite this article

Max Kotlyar., Igor Jurisica. Predicting Protein-Protein Interactions by Association Mining. Inf Syst Front 8, 37–47 (2006). https://doi.org/10.1007/s10796-005-6102-8

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10796-005-6102-8

Keywords

Navigation