Elsevier

Journal of Proteomics

Volume 171, 16 January 2018, Pages 87-94
Journal of Proteomics

Review
Merging in-silico and in vitro salivary protein complex partners using the STRING database: A tutorial

https://doi.org/10.1016/j.jprot.2017.08.002Get rights and content

Highlights

  • STRING database is a useful tool for studying protein-protein interactions.

  • It can collect and integrate protein association data from many organisms.

  • Data about both known and predicted protein-protein interactions can be obtained.

  • STRING database can help in elucidating the Human Interactome.

Abstract

Protein-protein interaction is a common physiological mechanism for protection and actions of proteins in an organism. The identification and characterization of protein-protein interactions in different organisms is necessary to better understand their physiology and to determine their efficacy. In a previous in vitro study using mass spectrometry, we identified 43 proteins that interact with histatin 1. Six previously documented interactors were confirmed and 37 novel partners were identified. In this tutorial, we aimed to demonstrate the usefulness of the STRING database for studying protein-protein interactions. We used an in-silico approach along with the STRING database (http://string-db.org/) and successfully performed a fast simulation of a novel constructed histatin 1 protein-protein network, including both the previously known and the predicted interactors, along with our newly identified interactors. Our study highlights the advantages and importance of applying bioinformatics tools to merge in-silico tactics with experimental in vitro findings for rapid advancement of our knowledge about protein-protein interactions. Our findings also indicate that bioinformatics tools such as the STRING protein network database can help predict potential interactions between proteins and thus serve as a guide for future steps in our exploration of the Human Interactome.

Significance

Our study highlights the usefulness of the STRING protein database for studying protein-protein interactions. The STRING database can collect and integrate data about known and predicted protein-protein associations from many organisms, including both direct (physical) and indirect (functional) interactions, in an easy-to-use interface.

Introduction

Study on the biological roles of proteins in a system begin with characterization of the individual proteins. However, proteins usually interact with other proteins in vivo for many purposes and in different ways. Proteins are likely to interact directly by binding to other proteins, as a defense mechanism against degradation; to modulate the function of one or both partners; or to rely on a partner as a delivery system for transport. In addition, proteins also communicate though indirect interactions to regulate each other's production and half-life, to exchange reaction products, and to activate/deactivate different signaling pathways, thus contributing to the functioning of the whole organism. The broad combination of these direct and indirect interactions determines the functional associations among proteins [1], [2], [3], [4], [5], [6].

Different experimental methods can be used to identify proteins that interact directly to form heterotypic complexes. Co-immunoprecipitation (Co-IP) and/or pull-down assays associated with mass spectrometry (MS) are most commonly used for this purpose. Both these approaches are very efficient for the recovery of protein complexes when the protein partners exhibit strong and stable interactions. In Co-IP, the target protein precipitated by the antibody is used to co-precipitate a binding partner/protein complex from a mixture. On the other hand, in the pull-down assay, a “bait” protein is used instead of an antibody to extract proteins that bind to it from the mixture or protein complexes that contain proteins that bind to the “bait” protein. The combination of both these methods helps efficiently identify heterotypic complex partners, which helps avoid false positive results.

After separating the proteins of interest from a mixture and identifying each amino acid constituent and post-translational modification, all the efforts turn to the difficult task of characterizing protein-protein interactions and their biological roles. Many databases and online resources have been created to assist in this titanic mission. First, evidence-based protein-protein interactions are curated from the published literature by members of the UniProt [7] and IMEx [8], [9] consortia. Based on this information, databases such as BioGRID [10], HINT [11], iRefWeb [12], and APID [13] collect all data about the interactions proven in the experiments. Finally, databases designed to build on top of the data obtained about direct protein-protein interactions in the experiments add data about indirect and predicted protein-protein interactions to create a more comprehensive network. Some examples from this last group of bioinformatics tools are GeneMANIA [14], Integrated Multi-species Prediction [15], Integrated Interactions Databases [16], HumanNet [17], FunCoup [18], and the STRING database [19], [20]. In these latter databases, scores are provided to weigh interactions based on their confidence.

In the STRING database, for example, each stored protein-protein interaction has a score (between zero and one) representing its confidence. The supporting evidence for each interaction is divided into seven “evidence channels.” The experiments channel is formed by results from the lab that proved protein interactions (including biochemical, biophysical, and genetic experiments), mostly data from the IMEx consortium and BioGRID. The database channel is manually curated and imported from pathway databases. The textmining channel indicates possible interactions between proteins that are mentioned in the same PubMed abstracts, in an in-house selection of more than 3 million full-text articles, and in other text collections [21], [22]. The evidence is considered stronger if a concept such as “binding” or “phosphorylated by” is found to connect the mentioned proteins. The coexpression channel shows normalized, pruned, and correlated gene expression data from many experiments [23]. The neighborhood channel is a genome-based prediction channel, where genes are given scores if they are frequently seen in each other's genome neighborhood. In the fusion channel, pairs of proteins are given association scores if there is at least one organism in which their respective orthologs have fused into a single, protein-codin gene. The co-occurrence channel evaluates the phylogenetic distribution of orthologs of all proteins in an organism; two proteins with high similarity in this distribution are assigned a score [24]. Finally, in addition to the seven listed evidence channels, the STRING database also benefits from the transfer of evidence from one organism to another, since orthologs of interacting proteins in one organism often also interact in other organisms; this is called “interolog” transfer [25], [26].

In a previous study, our group identified 43 proteins in human saliva that participate in heterotypic complexes with histatin 1 [27], a salivary protein with many functions in the oral cavity, including strong antibacterial and antifungal functions. To exemplify the use of the STRING database in salivary proteomics research, in this tutorial, we provide a detailed guide on how we used the STRING protein network database to combine the results of an in-silico approach to identify the proteins that interact with histatin 1 with the results of our previous in vitro experiments, where Co-IP and pull-down assays, followed by MS, were used to identify the interactors of histatin 1 in saliva. Herein, we demonstrated how this skill web tool, STRING database, can be used to simulate protein-protein networks. We also discussed the importance and advantages of combining in-silico and in vitro approaches to provide a more comprehensive view of the real protein hub.

Section snippets

Identification of the complex partners of histatin 1

Heterotypic complexes of histatin 1 with other salivary proteins in parotid saliva were identified using classical protein-protein interaction methods in combination with MS [27]. A co-IP assay using magnetic beads and a pull-down assay with immunopure immobilized streptavidin beads were used to separate the histatin 1 complexes from the mixture. In-solution tryptic digestion was performed, and the proteins present in the complex were identified using RP LC-ESI-MS/MS. Positive identification

Discussion

According to the PathGuide resource (http://pathguide.org), almost 300 protein-protein interaction databases are available [29], one of which is The STRING protein-protein network database, which has been maintained since the year 2000. Some qualities that distinguish STRING from other databases include its comprehensiveness, usability, quality control, and traceability [28]. Moreover, STRING covers the largest number of organisms and uses many input sources—called evidence channels—which

Conclusion

Due to the increasing complexity of functional associations among proteins, databases have become essential instruments in the study of protein-protein interactions. Huge benefits unfold from the correct use of these fantastic bioinformatics helpers. The STRING database fulfills the actual need for a tool that can collect and integrate data about known and predicted protein-protein associations from many organisms, including both direct (physical) and indirect (functional) interactions, in an

Transparency document

Transparency document.

Acknowledgment

Canadian Institutes of Health Research (CIHR grants # 106657 and 97577). W.L.S. is a recipient of a CIHR New Investigator Award (grant # 113166). K.T.B.C. holds a Queen Elizabeth II Graduate Scholarship in Science and Technology (QEII-GSST).

Author contributions statement

K.T.B.C., W.L.S., conceived the manuscript; K.T.B.C. and E.B.M. conducted the analysis; K.T.B.C., W.L.S., Y.X., E.B.M., analyzed the results. K.T.B.C., W.L.S., and E.B.M. wrote the manuscript. All authors reviewed the manuscript.

References (32)

  • A.J. Enright et al.

    Functional associations of proteins in entire genomes by means of exhaustive detection of gene fusions

    Genome Biol.

    (2001)
  • B. Snel et al.

    The identification of functional modules from the genomic association of genes

    Proc. Natl. Acad. Sci. U. S. A.

    (2002)
  • A.W. Rives et al.

    Modular organization of cellular networks

    Proc. Natl. Acad. Sci. U. S. A.

    (2003)
  • J. De Las Rivas et al.

    Interactome data and databases: different types of protein interaction

    Comp. Funct. Genomics

    (2004)
  • R. Dannenfelser et al.

    Genes2FANs: connecting genes through functional association networks

    BMC Bioinf.

    (2012)
  • M.E. Studham et al.

    Functional association networks as priors for gene regulatory network inference

    Bioinformatics

    (2014)
  • C. UniProt

    UniProt: a hub for protein information

    Nucleic Acids Res.

    (2015)
  • S. Orchard et al.

    Protein interaction data curation: the International Molecular Exchange (IMEx) consortium

    Nat. Methods

    (2012)
  • S. Orchard et al.

    The MIntAct project—IntAct as a common curation platform for 11 molecular interaction databases

    Nucleic Acids Res.

    (2014)
  • A. Chatr-Aryamontri et al.

    The BioGRID interaction database: 2015 update

    Nucleic Acids Res.

    (2015)
  • J. Das et al.

    HINT: high-quality protein interactomes and their applications in understanding human disease

    BMC Syst. Biol.

    (2012)
  • B. Turner et al.

    iRefWeb: interactive analysis of consolidated protein interaction data and their supporting evidence

    Database (Oxford)

    (2010)
  • D. Alonso-Lopez et al.

    APID interactomes: providing proteome-based interactomes with controlled quality for multiple species and derived networks

    Nucleic Acids Res.

    (2016)
  • K. Zuberi et al.

    GeneMANIA prediction server 2013 update

    Nucleic Acids Res.

    (2013)
  • A.K. Wong et al.

    IMP 2.0: a multi-species functional genomics portal for integration, visualization and prediction of protein functions and networks

    Nucleic Acids Res.

    (2015)
  • M. Kotlyar et al.

    Integrated interactions database: tissue-specific view of the human and model organism interactomes

    Nucleic Acids Res.

    (2016)
  • Cited by (54)

    View all citing articles on Scopus
    View full text