ReviewMerging in-silico and in vitro salivary protein complex partners using the STRING database: A tutorial
Introduction
Study on the biological roles of proteins in a system begin with characterization of the individual proteins. However, proteins usually interact with other proteins in vivo for many purposes and in different ways. Proteins are likely to interact directly by binding to other proteins, as a defense mechanism against degradation; to modulate the function of one or both partners; or to rely on a partner as a delivery system for transport. In addition, proteins also communicate though indirect interactions to regulate each other's production and half-life, to exchange reaction products, and to activate/deactivate different signaling pathways, thus contributing to the functioning of the whole organism. The broad combination of these direct and indirect interactions determines the functional associations among proteins [1], [2], [3], [4], [5], [6].
Different experimental methods can be used to identify proteins that interact directly to form heterotypic complexes. Co-immunoprecipitation (Co-IP) and/or pull-down assays associated with mass spectrometry (MS) are most commonly used for this purpose. Both these approaches are very efficient for the recovery of protein complexes when the protein partners exhibit strong and stable interactions. In Co-IP, the target protein precipitated by the antibody is used to co-precipitate a binding partner/protein complex from a mixture. On the other hand, in the pull-down assay, a “bait” protein is used instead of an antibody to extract proteins that bind to it from the mixture or protein complexes that contain proteins that bind to the “bait” protein. The combination of both these methods helps efficiently identify heterotypic complex partners, which helps avoid false positive results.
After separating the proteins of interest from a mixture and identifying each amino acid constituent and post-translational modification, all the efforts turn to the difficult task of characterizing protein-protein interactions and their biological roles. Many databases and online resources have been created to assist in this titanic mission. First, evidence-based protein-protein interactions are curated from the published literature by members of the UniProt [7] and IMEx [8], [9] consortia. Based on this information, databases such as BioGRID [10], HINT [11], iRefWeb [12], and APID [13] collect all data about the interactions proven in the experiments. Finally, databases designed to build on top of the data obtained about direct protein-protein interactions in the experiments add data about indirect and predicted protein-protein interactions to create a more comprehensive network. Some examples from this last group of bioinformatics tools are GeneMANIA [14], Integrated Multi-species Prediction [15], Integrated Interactions Databases [16], HumanNet [17], FunCoup [18], and the STRING database [19], [20]. In these latter databases, scores are provided to weigh interactions based on their confidence.
In the STRING database, for example, each stored protein-protein interaction has a score (between zero and one) representing its confidence. The supporting evidence for each interaction is divided into seven “evidence channels.” The experiments channel is formed by results from the lab that proved protein interactions (including biochemical, biophysical, and genetic experiments), mostly data from the IMEx consortium and BioGRID. The database channel is manually curated and imported from pathway databases. The textmining channel indicates possible interactions between proteins that are mentioned in the same PubMed abstracts, in an in-house selection of more than 3 million full-text articles, and in other text collections [21], [22]. The evidence is considered stronger if a concept such as “binding” or “phosphorylated by” is found to connect the mentioned proteins. The coexpression channel shows normalized, pruned, and correlated gene expression data from many experiments [23]. The neighborhood channel is a genome-based prediction channel, where genes are given scores if they are frequently seen in each other's genome neighborhood. In the fusion channel, pairs of proteins are given association scores if there is at least one organism in which their respective orthologs have fused into a single, protein-codin gene. The co-occurrence channel evaluates the phylogenetic distribution of orthologs of all proteins in an organism; two proteins with high similarity in this distribution are assigned a score [24]. Finally, in addition to the seven listed evidence channels, the STRING database also benefits from the transfer of evidence from one organism to another, since orthologs of interacting proteins in one organism often also interact in other organisms; this is called “interolog” transfer [25], [26].
In a previous study, our group identified 43 proteins in human saliva that participate in heterotypic complexes with histatin 1 [27], a salivary protein with many functions in the oral cavity, including strong antibacterial and antifungal functions. To exemplify the use of the STRING database in salivary proteomics research, in this tutorial, we provide a detailed guide on how we used the STRING protein network database to combine the results of an in-silico approach to identify the proteins that interact with histatin 1 with the results of our previous in vitro experiments, where Co-IP and pull-down assays, followed by MS, were used to identify the interactors of histatin 1 in saliva. Herein, we demonstrated how this skill web tool, STRING database, can be used to simulate protein-protein networks. We also discussed the importance and advantages of combining in-silico and in vitro approaches to provide a more comprehensive view of the real protein hub.
Section snippets
Identification of the complex partners of histatin 1
Heterotypic complexes of histatin 1 with other salivary proteins in parotid saliva were identified using classical protein-protein interaction methods in combination with MS [27]. A co-IP assay using magnetic beads and a pull-down assay with immunopure immobilized streptavidin beads were used to separate the histatin 1 complexes from the mixture. In-solution tryptic digestion was performed, and the proteins present in the complex were identified using RP LC-ESI-MS/MS. Positive identification
Discussion
According to the PathGuide resource (http://pathguide.org), almost 300 protein-protein interaction databases are available [29], one of which is The STRING protein-protein network database, which has been maintained since the year 2000. Some qualities that distinguish STRING from other databases include its comprehensiveness, usability, quality control, and traceability [28]. Moreover, STRING covers the largest number of organisms and uses many input sources—called evidence channels—which
Conclusion
Due to the increasing complexity of functional associations among proteins, databases have become essential instruments in the study of protein-protein interactions. Huge benefits unfold from the correct use of these fantastic bioinformatics helpers. The STRING database fulfills the actual need for a tool that can collect and integrate data about known and predicted protein-protein associations from many organisms, including both direct (physical) and indirect (functional) interactions, in an
Transparency document
Acknowledgment
Canadian Institutes of Health Research (CIHR grants # 106657 and 97577). W.L.S. is a recipient of a CIHR New Investigator Award (grant # 113166). K.T.B.C. holds a Queen Elizabeth II Graduate Scholarship in Science and Technology (QEII-GSST).
Author contributions statement
K.T.B.C., W.L.S., conceived the manuscript; K.T.B.C. and E.B.M. conducted the analysis; K.T.B.C., W.L.S., Y.X., E.B.M., analyzed the results. K.T.B.C., W.L.S., and E.B.M. wrote the manuscript. All authors reviewed the manuscript.
References (32)
- et al.
Functional associations of proteins in entire genomes by means of exhaustive detection of gene fusions
Genome Biol.
(2001) - et al.
The identification of functional modules from the genomic association of genes
Proc. Natl. Acad. Sci. U. S. A.
(2002) - et al.
Modular organization of cellular networks
Proc. Natl. Acad. Sci. U. S. A.
(2003) - et al.
Interactome data and databases: different types of protein interaction
Comp. Funct. Genomics
(2004) - et al.
Genes2FANs: connecting genes through functional association networks
BMC Bioinf.
(2012) - et al.
Functional association networks as priors for gene regulatory network inference
Bioinformatics
(2014) UniProt: a hub for protein information
Nucleic Acids Res.
(2015)- et al.
Protein interaction data curation: the International Molecular Exchange (IMEx) consortium
Nat. Methods
(2012) - et al.
The MIntAct project—IntAct as a common curation platform for 11 molecular interaction databases
Nucleic Acids Res.
(2014) - et al.
The BioGRID interaction database: 2015 update
Nucleic Acids Res.
(2015)
HINT: high-quality protein interactomes and their applications in understanding human disease
BMC Syst. Biol.
iRefWeb: interactive analysis of consolidated protein interaction data and their supporting evidence
Database (Oxford)
APID interactomes: providing proteome-based interactomes with controlled quality for multiple species and derived networks
Nucleic Acids Res.
GeneMANIA prediction server 2013 update
Nucleic Acids Res.
IMP 2.0: a multi-species functional genomics portal for integration, visualization and prediction of protein functions and networks
Nucleic Acids Res.
Integrated interactions database: tissue-specific view of the human and model organism interactomes
Nucleic Acids Res.
Cited by (54)
Bidens pilosa root exudates modulate Pteris multifida gametophyte development: A proteomic investigation
2023, Industrial Crops and ProductsSynthesis, antimicrobial activity, molecular docking and pharmacophore analysis of new propionyl mannopyranosides
2023, Journal of Molecular StructureGlycolysis, a new mechanism of oleuropein against liver tumor
2023, PhytomedicineProtInteract: A deep learning framework for predicting protein–protein interactions
2023, Computational and Structural Biotechnology JournalSkin Aging in Long-Lived Naked Mole-Rats Is Accompanied by Increased Expression of Longevity-Associated and Tumor Suppressor Genes
2022, Journal of Investigative DermatologyPositive effect of ethanol-induced Lactococcus lactis on alcohol metabolism in mice
2022, Food Science and Human Wellness