Review
Bioinformatics tools for secretome analysis

https://doi.org/10.1016/j.bbapap.2013.01.039Get rights and content

Abstract

Over recent years, analyses of secretomes (complete sets of secreted proteins) have been reported in various organisms, cell types, and pathologies and such studies are quickly gaining popularity. Fungi secrete enzymes can break down potential food sources; plant secreted proteins are primarily parts of the cell wall proteome; and human secreted proteins are involved in cellular immunity and communication, and provide useful information for the discovery of novel biomarkers, such as for cancer diagnosis. Continuous development of methodologies supports the wide identification and quantification of secreted proteins in a given cellular state. The role of secreted factors is also investigated in the context of the regulation of major signaling events, and connectivity maps are built to describe the differential expression and dynamic changes of secretomes. Bioinformatics has become the bridge between secretome data and computational tasks for managing, mining, and retrieving information. Predictions can be made based on this information, contributing to the elucidation of a given organism's physiological state and the determination of the specific malfunction in disease states. Here we provide an overview of the available bioinformatics databases and software that are used to analyze the biological meaning of secretome data, including descriptions of the main functions and limitations of these tools. The important challenges of data analysis are mainly related to the integration of biological information from dissimilar sources. Improvements in databases and developments in software will likely substantially contribute to the usefulness and reliability of secretome studies. This article is part of a Special Issue entitled: An Updated Secretome.

Highlights

► Bioinformatics provides tools for in silico and experimental secretome profiling. ► The knowledge of secreted proteins brings clear insight into the basic biology. ► Bioinformatics integration of secretome in systemic knowledge is needed. ► Tools to recognize diagnostic and therapeutic opportunities are crucial.

Introduction

The term secretome refers to a set of proteins that includes extracellular matrix (ECM) proteins, proteins shed from the cell membrane, and vesicle proteins (e.g., from exosomes and microsomal vesicles) [1], [2]. These secreted proteins play important roles in homeostasis, immune response, development, proteolysis, adhesion, and extracellular matrix organization. The secretome is highly dynamic, and its composition changes in response to various pathologies and environmental stimuli. Intracellular pathway and network analyses can provide mechanistic insights by linking proteins to their underlying cellular functions and to other key players known to be involved in these events.

While the earliest secretome analyses were performed in bacteria and fungi [2], there have now been many investigations into the mammalian secretome. The Secreted Protein Database (SPD, http://spd.cbi.pku.edu.cn/) is a collection of over 18,000 secreted proteins from the human, mouse, and rat proteomes; it includes sequences from SwissProt, Trembl, Ensembl, and Refseq [3]. It is estimated that out of the total 20,500 human protein-coding genes, approximately 10% encode secreted proteins [4], [5], [6].

The majority of secretome studies are conducted in vitro using cell culture methods in which secreted proteins are obtained from conditioned media of serum-starved cultured cell lines [7]. These studies routinely employ high-resolution separation techniques, such as two-dimensional gel electrophoresis and/or liquid chromatography, in combination with advanced mass spectrometric methods for the unequivocal identification of peptides and proteins in samples [8]. Very recently, a method that combines click chemistry and pulsed stable isotope labeling with amino acids in cell culture has been successfully adopted to selectively enrich and quantify secreted proteins in a background of serum-containing media [9]. However, using different analytical approaches high-confidence proteins are identified using bioinformatics-based filters to remove from analyses all non-secreted proteins from broken and/or apoptotic cells that are present in the conditioned media of the cell lines. Such methodology typically involves interrogation of either primary sequence-based secretory pathway prediction algorithms or curated empirical subcellular localization databases (or a combination of both approaches).

An interesting fraction of secreted factors comprise cell surface receptor ligands, such as hormones, growth factors, and cytokines with important regulatory functions in biological processes [10], [11], [12]. Thus studying the cell secretome composition in mammals can enable identification of proteins released into host fluids, which could be candidates for use in developing new diagnostic tests and possibly new treatments in diseases and disorders [13], [14]. This emphasizes the need for large-scale and unbiased analysis of the cell secretome. In recent years, vastly improved bioinformatics analysis tools and technical advances in mass spectrometry have driven remarkable progress in secretome science. The present review provides an overview of the databases and software used for the data tracking, analysis, and interpretation of the secretome-describing the main functions and limitations of these tools.

Section snippets

Computational methods for prediction of secreted proteins

Secreted proteins account for approximately 10% of the total proteins encoded by a genome. In the absence of experimental data, the secretome profile of a living cell can be generated with in silico approaches; many different but complementary bioinformatic tools can be used to predict prokaryotic and eukaryotic secreted proteins from genomic/transcriptomic annotations [15] (Fig. 1). Such predictions are possible because of the specific conserved features of secreted proteins. In eukaryotes,

Experimental profiling of cell line secreted proteins

Secreted and shed proteins that are released through classical and non-classical secretion pathways can be profiled using an in vitro system, where experimental and controlled conditions allow reproducible and quantifiable results (Fig. 1). Many cell types have been used in secretome studies. In mammalian cell cultures and in the majority of secretome analysis studies, cells are grown in bovine serum-free media. One alternative approach involves the supplementation of isotopically labeled amino

Secretome data interpretation

Bioinformatics tools (software and databases) are indispensable for data analysis and the construction of methodologies for interpreting secretome/proteome results (Fig. 1). Dependable data interpretation is necessary for the formulation and investigation of hypotheses relating to biological processes, and for the proposal of disease biomarkers and discovery of new drug targets (Fig. 2). It is possible to collect essential information about the proteins in a secretome using gene ontology (GO)

Bioinformatics-assisted standardization and sharing of datasets

Data sharing represents a new challenge in modern proteomics. There are several on-going international efforts to develop proteomics data standards to facilitate data sharing and reuse. The first obstacle to data sharing is the data format; each MS instrument generates raw data as files in proprietary formats (e.g., ABI/Sciex WIFF, Bruker FID/YEP/BAF, Thermo Scientific RAW, and Waters MassLynx file types). Recently, open source tools have been developed to convert proprietary formatted files to

Conclusions

The literature clearly contains an increasing number of publications on secretome identification and analysis, using both computational and experimental approaches. This wealth of studies provides an improved understanding of secretome biology in many types of organisms. Regarding fungal secreted proteins, such advancement can lead to greater development of various potential applications in bio-processing, environmental remediation industries, and in pathogenesis. Secreted proteins are also

Competing interests

The authors declare that they have no competing interests. All authors have read and approved the final manuscript.

Acknowledgements

This study was supported by grants from the Associazione Italiana per la Ricerca sul Cancro (AIRC n. 5896 and AIRC 5x1000 n. 12162) and the Italian Istituto Superiore di Sanità.

References (125)

  • K. Yanagisawa et al.

    Proteomic patterns of tumour subsets in non-small-cell lung cancer

    Lancet

    (2003)
  • D.J. Pappin et al.

    Rapid identification of proteins by peptide-mass fingerprinting

    Curr. Biol.

    (1993)
  • B.D. Halligan et al.

    ZoomQuant: an application for the quantitation of stable isotope labeled peptides

    J. Am. Soc. Mass Spectrom.

    (2005)
  • S. Cha et al.

    In situ proteomic analysis of human breast cancer epithelial cells using laser capture microdissection: annotation by protein set enrichment analysis and gene ontology

    Mol. Cell Proteomics

    (2010)
  • H. Antelmann et al.

    A proteomic view on genome-based signal peptide predictions

    Genome Res.

    (2001)
  • Y. Chen et al.

    Secreted protein prediction system combining CJ-SPHMM, TMHMM, and PSORT

    Mamm. Genome

    (2003)
  • M. Clamp et al.

    Distinguishing protein-coding and noncoding genes in the human genome

    Proc. Natl. Acad. Sci. U.S.A.

    (2007)
  • H. Skalnikova et al.

    Mapping of the secretome of primary isolates of mammalian cells, stem cells and derived cell lines

    Proteomics

    (2011)
  • M.A. Blanco et al.

    Global secretome analysis identifies novel mediators of bone metastasis

    Cell Res.

    (2012)
  • R. Peterson et al.

    Secretome of the coprophilous fungus Doratomyces stemonitis C8, isolated from koala feces

    Appl. Environ. Microbiol.

    (2011)
  • K. Eichelbaum et al.

    Selective enrichment of newly synthesized proteins for quantitative secretome analysis

    Nat. Biotechnol.

    (2012)
  • J. Flier et al.

    Differential expression of CXCR3 targeting chemokines CXCL10, CXCL9, and CXCL11 in different types of skin inflammation

    J. Pathol.

    (2001)
  • A.E. Pedersen et al.

    CD25 shedding by human natural occurring CD4 + CD25 + regulatory T cells does not inhibit the action of IL-2

    Scand. J. Immunol.

    (2009)
  • K. Walsh

    Adipokines, myokines and cardiovascular disease

    Circ. J.

    (2009)
  • A.L. Bonin-Debs et al.

    Development of secreted proteins as biotherapeutic agents

    Expert. Opin. Biol. Ther.

    (2004)
  • R. Shah et al.

    Gene profiling of human adipose tissue during evoked inflammation in vivo

    Diabetes

    (2009)
  • P.A. Lee et al.

    The bacterial twin-arginine translocation pathway

    Annu. Rev. Microbiol.

    (2006)
  • F. Raimondo et al.

    Advances in membranous vesicle and exosome proteomics improving biological understanding and biomarker discovery

    Proteomics

    (2011)
  • G. von Heijne

    A new method for predicting signal sequence cleavage sites

    Nucleic Acids Res.

    (1986)
  • K. Hiller et al.

    PrediSi: prediction of signal peptides and their cleavage positions

    Nucleic Acids Res.

    (2004)
  • S.F. Altschul et al.

    Gapped BLAST and PSI-BLAST: a new generation of protein database search programs

    Nucleic Acids Res.

    (1997)
  • K. Frank et al.

    High-performance signal peptide prediction based on sequence alignment techniques

    Bioinformatics

    (2008)
  • J.S. Lai et al.

    Computational comparative study of tuberculosis proteomes using a model learned from signal peptide structures

    PLoS One

    (2012)
  • H. Nielsen et al.

    Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites

    Protein Eng.

    (1997)
  • H. Nielsen et al.

    Prediction of signal peptides and signal anchors by a hidden Markov model

    Proc. Int. Conf. Intell. Syst. Mol. Biol.

    (1998)
  • J.D. Bendtsen et al.

    Improved prediction of signal peptides: SignalP 3.0

    J. Mol. Biol.

    (2004)
  • K.H. Choo et al.

    A comprehensive assessment of N-terminal signal peptides prediction methods

    BMC Bioinforma.

    (2009)
  • H. Viklund et al.

    SPOCTOPUS: a combined predictor of signal peptides and membrane protein topology

    Bioinformatics

    (2008)
  • S.M. Reynolds et al.

    Transmembrane topology and signal peptide prediction using dynamic Bayesian networks

    PLoS Comput. Biol.

    (2008)
  • D.T. Jones

    Improving the accuracy of transmembrane protein topology prediction using evolutionary information

    Bioinformatics

    (2007)
  • T. Nugent et al.

    Transmembrane protein topology prediction using support vector machines

    BMC Bioinforma.

    (2009)
  • T.N. Petersen et al.

    SignalP 4.0: discriminating signal peptides from transmembrane regions

    Nat. Methods

    (2011)
  • E.L. Sonnhammer et al.

    A hidden Markov model for predicting transmembrane helices in protein sequences

    Proc. Int. Conf. Intell. Syst. Mol. Biol.

    (1998)
  • R.W. Rose et al.

    Adaptation of protein secretion to extremely high-salt conditions by extensive use of the twin-arginine translocation pathway

    Mol. Microbiol.

    (2002)
  • J.D. Bendtsen et al.

    Prediction of twin-arginine signal peptides

    BMC Bioinforma.

    (2005)
  • P.G. Bagos et al.

    Combined prediction of Tat and Sec signal peptides with hidden Markov models

    Bioinformatics

    (2010)
  • A.S. Juncker et al.

    Prediction of lipoprotein signal peptides in Gram-negative bacteria

    Protein Sci.

    (2003)
  • P.G. Bagos et al.

    Prediction of lipoprotein signal peptides in Gram-positive bacteria with a Hidden Markov Model

    J. Proteome Res.

    (2008)
  • J.D. Bendtsen et al.

    Feature-based prediction of non-classical and leaderless protein secretion

    Protein Eng. Des. Sel.

    (2004)
  • J.D. Bendtsen et al.

    Non-classical protein secretion in bacteria

    BMC Microbiol.

    (2005)
  • Cited by (82)

    • Trypanosoma evansi secretome carries potential biomarkers for Surra diagnosis

      2023, Journal of Proteomics
      Citation Excerpt :

      One potential source of biomarkers is the organism's secretome, where a set of proteins are at a given time and under certain conditions [17,18]. The analysis of secretomes by mass spectrometry combined with bioinformatics [19] has been an important strategy used to reveal experimental data on secreted proteins, including non-conventional secretory pathways [20], which has helped to identify potential biomarkers [20–22] as well as to discover new pathological mechanisms [23]. The first study that analyzed the secretome of trypanosomes infecting animals used T. congolense and T. evansi [24].

    • Exploring the role of secretory proteins in the human infectious diseases diagnosis and therapeutics

      2023, Advances in Protein Chemistry and Structural Biology
      Citation Excerpt :

      Constitutive cells include the liver cells, fibroblast, macrophages, B-lymphocytes while neurons, endocrine, exocrine, neutrophils, mast cells, and such are included within the regulated cells (Kelly, 1985). Besides the classical pathway marked by the N-terminal signal peptide, alternative pathways involving cell-surface shedding and SP inclusion within the secretory vesicles are also adopted by several proteins (Caccia et al., 2013). The dynamic nature of the human secretome conforms it to pathological and environmental-stimuli-induced alterations, having differential expression levels that can be significant markers in the diagnosis and prognosis of diseases.

    • Proteomic changes in the extracellular environment of sea bass thymocytes exposed to 17α-ethinylestradiol in vitro

      2021, Comparative Biochemistry and Physiology - Part D: Genomics and Proteomics
    View all citing articles on Scopus

    This article is part of a Special Issue entitled: An Updated Secretome.

    View full text