Biochimica et Biophysica Acta (BBA) - Proteins and Proteomics
ReviewFunctional annotation and biological interpretation of proteomics data
Graphical abstract
Introduction
Proteomics encompasses a broad range of high-throughput technologies that allows the identification and the quantification of proteins in complex biological samples. Quantitative proteomics approaches rely on the ability to detect small changes in protein abundance of an altered state given a control or reference condition. Thus, the quantification of differences between two or more physiological states of a biological system can be expressed as an absolute protein quantification, by the determination of the exact protein amount or concentration, or as a relative quantification of protein amount, in which the amount of a protein can be defined as fold changes relative to the control sample, determining the up- or down-regulation of such protein [1], [2]. Proteomics approaches have been extensively applied in biomedical research for the understanding of diseases, including protein-based biomarker discovery for the early detection and monitoring of different types of cancer [3], [4], the analysis of abnormal protein phosphorylation patterns associated with diseases [5], [6], such as Alzheimer's [7], the identification of therapeutic targets [8], [9], among others. However, mass spectrometry-based proteomics often generates large lists of identified proteins whose interpretation is a challenging task in the field. In order to handle the proteomics data, Biostatistics and Bioinformatics tools become indispensable to the interpretation of biological data and to extract the biological relevance from the vast amount of identified proteins [10]. Thus, protein functional annotation through computational tools now occupies a place as important as the protein identification itself. Since the advent of shotgun proteomics, many Bioinformatics tools have been developed to provide methodologies for functional annotation of proteomics data. Typical approaches for data interpretation for organisms without an annotated genome include mainly the automated protein annotation as a first step in the data analysis workflow. Protein domains, protein family, subcellular localization and biological function are predicted based on sequence similarity searches [11], [12], [13], [14].
Once the protein sequences are functionally annotated, several other tools must be applied to the search for functional patterns and overrepresentation of biological functions or processes in a protein dataset from qualitative or quantitative proteomics data. Further steps in the analysis usually include pathway analysis and the prediction of interaction networks, which are generated through integration of different biological layers of information, such as gene expression and co-expression patterns, protein–protein interactions and protein expression data. Moreover, visualization tools largely contribute to localize the presence of targeted proteins within cellular biological pathways, signaling cascades and metabolic pathways being the most represented ones in proteomic studies.
A variety of commercial and open-source bioinformatics tools for the analysis of proteomics data and statistical tests have been developed. However, with the increased amount of proteomics data new challenges in data handling, analysis and visualization push forward the development of the field of computational proteomics. In order to give an overview of tools and approaches currently applied in proteomics functional annotation, we reviewed and discussed different approaches, computational programs and strategies recently applied for data interpretation, and how different aspects of the analysis can modify the outcome of proteomics studies.
Section snippets
Biological meaning of large proteomics datasets through gene ontology-based annotation approaches
The prediction of the functional role of identified proteins in a biological event involves a first step of gathering information, a task that must be performed before the actual biological data interpretation is achieved and may include genome and proteome annotations. Many tools have been developed to mine several databases of biological information to finally predict a protein function based on sequence similarities. Detailed strategies on genomics and proteomics sequence annotation can be
Functional enrichment analysis for identification of overrepresented biological mechanisms
In proteomic studies, the protein identity constitutes the most important information for further biological annotation of large data sets. Furthermore, proteins identified through shotgun proteomics approaches may be related to a broad diversity of biological functions, which may have a role in many different biological pathways. Moreover, variations on protein expression levels may give indications about alterations of cellular mechanisms, such as changes resulting from the development of
Integration of functional annotations through biological network analysis
In the past years, several models of biological networks have been generated by computer science in order to aid visualization of results from the simulation of biological systems, such as biochemical reactions [47] and protein interaction [48] networks. The aims of such models in biological research are to (1) systematically interrogate and experimentally verify knowledge of a pathway, (2) manage the complexity of cellular components and interactions, and (3) provide an outlook of properties
Conclusions
Datasets generated in proteomics experiments are usually large lists of protein identifications. However, the extraction of biological meaning of these large datasets must be performed through functional annotation.
Furthermore, pathway analysis and generation of interaction networks based on previous data are fundamental in the visualization and interpretation of biological processes involved in the conditions studied. Over the last years, bioinformatics tools were developed to gathering the
Acknowledgments
This work was supported by FAPESP grants: 2009/54067-3, 2010/19278-0, 2009/52833-0 and CNPq grants: 470549/2011-4, 301702/2011-0 and 470268/2013-1 to AFPL and a CAPES fellowship to C.M.C.
References (77)
- et al.
Challenges and prospects for biomarker research: a current perspective from the developing world
Biochim. Biophys. Acta
(2014) - et al.
Biomarker research with prospective study designs for the early detection of cancer
Biochim. Biophys. Acta
(2014) - et al.
Phosphoproteome analysis reveals differences in phosphosite profiles between tumorigenic and non-tumorigenic epithelial cells
J. Proteomics
(2014) - et al.
Quantitative proteomics analysis of phosphorylated proteins in the hippocampus of Alzheimer's disease subjects
J. Proteomics
(2011) - et al.
The secretome in cancer progression
Biochim. Biophys. Acta
(2013) - et al.
Bioinformatics analysis of mass spectrometry-based proteomics data sets
FEBS Lett.
(2009) Automatic annotation of protein function
Curr. Opin. Struct. Biol.
(2005)- et al.
GeneSetDB: a comprehensive meta-database, statistical and visualisation framework for gene set analysis
FEBS Open Biol.
(2012) - et al.
In situ proteomic analysis of human breast cancer epithelial cells using laser capture microdissection: annotation by protein set enrichment analysis and gene ontology
Mol. Cell. Proteomics
(2010) - et al.
Towards a functional proteomics approach to the comprehension of idiopathic pulmonary fibrosis, sarcoidosis, systemic sclerosis and pulmonary Langerhans cell histiocytosis
J. Proteomics
(2013)
Identification of potential bladder cancer markers in urine by abundant-protein depletion coupled with quantitative proteomics
J. Proteomics
Current trends in quantitative proteomics
J. Mass Spectrom.
Mass spectrometry-based proteomics turns quantitative
Nat. Chem. Biol.
Global and site-specific quantitative phosphoproteomics: principles and applications
Annu. Rev. Pharmacol. Toxicol.
ADAM17 mediates OSCC development in an orthotopic murine model
Mol. Cancer
Automatic prediction of protein function
Cell. Mol. Life Sci.
Protein function annotation by homology-based inference
Genome Biol.
Sequence-based feature prediction and annotation of proteins
Genome Biol.
Genome annotation past, present, and future: how to define an ORF at each locus
Genome Res.
A beginner's guide to eukaryotic genome annotation
Nat. Rev. Genet.
Genome annotation: from sequence to biology
Nat. Rev. Genet.
Towards principles for the design of ontologies used for knowledge sharing
Int. J. Hum. Comput. Stud.
Ontologies in biology: design, applications and future challenges
Nat. Rev. Genet.
From ontology to semantic similarity: calculation of ontology-based semantic similarity
ScientificWorldJournal
BioPortal: ontologies and integrated data resources at the click of a mouse
Nucleic Acids Res.
Ontologies for molecular biology and bioinformatics
In Silico Biol.
Gene ontology: tool for the unification of biology. The Gene Ontology Consortium
Nat. Genet.
Assessing identity, redundancy and confounds in Gene Ontology annotations over time
Bioinformatics
Impact of ontology evolution on functional analyses
Bioinformatics
BiNGO: a Cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks
Bioinformatics
A travel guide to Cytoscape plugins
Nat. Methods
CODEX: exploration of semantic changes between ontology versions
Bioinformatics
Use and misuse of the gene ontology annotations
Nat. Rev. Genet.
From proteome lists to biological impact—tools and strategies for the analysis of large MS data sets
Proteomics
Creating the gene ontology resource: design and implementation
Genome Res.
Quality of computationally inferred gene ontology annotations
PLoS Comput. Biol.
The GOA database in 2009—an integrated Gene Ontology Annotation resource
Nucleic Acids Res.
Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists
Nucleic Acids Res.
Cited by (32)
An informatic workflow for the enhanced annotation of excretory/secretory proteins of Haemonchus contortus
2023, Computational and Structural Biotechnology JournalChronic Fluoxetine Treatment of Socially Isolated Rats Modulates Prefrontal Cortex Proteome
2022, NeuroscienceCitation Excerpt :The increasing prevalence of MDD worldwide requires more effective therapy approaches and research on the mechanism of action of antidepressants. Since hypothesis free-omics studies yield broad information regarding the number of biomolecules whose abundance changes can be simultaneously monitored (Carnielli et al., 2015), we performed a comparative subproteomic study of the rat PFC to explore modulated proteins and affected biochemical pathways underlying CSIS-induced depressive-like behavior and the effectiveness of Flx treatment. So far, no studies have examined the PFC cytosolic and NSM- proteome expression patterns of adult rats following CSIS in combination with Flx treatment.
High-throughput proteomics and metabolomic studies guide re-engineering of metabolic pathways in eukaryotic microalgae: A review
2021, Bioresource TechnologyCitation Excerpt :At present, Alga-PrAS, has almost 500,000 protein sequences from the proteomes of 27 different green algae, red algae and diatoms. Although high-throughput proteomics generates large amounts of data leading to protein identification and quantification (Carnielli et al., 2015), sufficient information on the role of proteins in biological processes cannot be determined based only on their abundance. To understand their biological importance, appropriate functional annotation is required.
Molecular physiology of copepods - from biomarkers to transcriptomes and back again
2019, Comparative Biochemistry and Physiology - Part D: Genomics and ProteomicsProteomic characterization of hippocampus of chronically socially isolated rats treated with fluoxetine: Depression-like behaviour and fluoxetine mechanism of action
2018, NeuropharmacologyCitation Excerpt :Extensive research is being performed to identify potential biomarker candidates in the diagnosis of depression and monitor antidepressant treatment responses by detecting protein and/or metabolite fluctuations in biological samples (Billelo, 2016; Bot et al., 2015). Since hypothesis free -omics studies yield a broad information regarding the number of biomolecules whose abundance changes can be simultaneously monitored, this approach is also useful in revealing/targeting the underlying mechanisms through the identification of relevant and deregulated metabolic pathways potentially implicated in disease and treatment (Carnielli et al., 2015). Even though the exact mechanisms of depression and recovery remain elusive, they are most likely to be accompanied by protein expression changes.