Pathway and network analysis in proteomics
Introduction
Proteomics, the collective study of all measured proteins in cells of a given condition, is inherently a systems science that requires the understanding of not only the independent parts – protein constituents and their expressions in a cell – but also the interplay of proteins, protein complexes, signaling pathways, and network modules as a whole for achieving biochemical functions. Ideker et al. (2001) introduced an integrated approach to identify metabolic networks and build cellular pathway models, by using measurements from DNA microarrays, protein expressions, and protein interaction knowledge. This work provides systems biology researchers with a practical example how biological networks could be used to perform integrative functional genomics data analysis. By gaining system-wide perspectives of protein functions, Proteomics promises to further study which subsets of proteins are essential in regulating specific biological process. In Proteomics analysis, the incorporating of prior knowledge how groups of proteins work in concert with each other or with other genes and metabolites has made it possible to unravel the complexity inherent in the analysis of cellular functions (MacBeath, 2002). New network biology and systems biology techniques have emerged in recent Proteomics studies (Bensimon et al., 2012, Sabidó et al., 2012) including cancer (Goh and Wong, 2013).
There has been a rapid accumulation of data due to advances in Proteomics technologies (MacBeath, 2002). Proteomics data are often generated from high-throughput experimental platforms, e.g., two-dimensional (2D) gel, liquid chromatography coupled tandem mass spectrometers (LC–MS/MS), multiplexed immunoassays, and protein microarrays (Altelaar et al., 2013, Kingsmore, 2006). These platforms can assay thousands of proteins simultaneously from complex biological samples (Aebersold and Mann, 2003) to measure the relative abundance of proteins or peptides in various biological conditions. More accurate quantitative measure of peptides could also be performed with isotopic labelling of proteins in two different samples (Ong and Mann, 2005). Similar to Genomics, Proteomics studies have been widely used to extract functional and temporal signals identified in biological systems (Blagoev et al., 2004). Popular experimental techniques to measure protein–protein interactions include the yeast two-hybrid (Y2H) system (Ito et al., 2001).
In contract to the recent accelerated application of next-generation sequencing (NGS) in biology, a primary hurdle that slows down Proteomics’ applications is the Proteomics data’s high variability, which makes it difficult to interpret Proteomics data analysis results biologically (Colinge and Bennett, 2007). Possible sources of data variations arise from biological sample heterogeneity, sample preparation variance, protein separation variance, detection limits of various proteomics techniques, and pattern-matching peptide/protein identification or quantification inaccuracies from Proteomics data management software. The unusual high level of data noises inherent in Proteomics studies in contrast to those in DNA microarrays or NGS instruments have made Proteomics experiments difficult to repeat, and many statistical methods developed for Genomics applications ineffective. There are plenty of reviews that cover the computational challenges (Vitek, 2009, Noble and MacCoss, 2012, Barla et al., 2008) and solutions to apply statistical machine learning approaches to the problem, e.g., with the use of support vector machines (SVM) (Elias et al., 2004), Markov clustering (Krogan et al., 2006), ant colony optimization (Ressom et al., 2007), and semi-supervised learning (Käll et al., 2007) techniques. The ultimate challenge, however, is how to extract functional and biological information from a long list of proteins identified or discovered from high-throughput Proteomic experiments, in order to provide biological insights into the underlying molecular mechanisms of different conditions (Khatri et al., 2012). Therefore, additional protein functional knowledge, e.g., the abundance of proteins, cellular locations, protein complexes, and gene/protein regulatory pathways, should be incorporated in the second phase of proteomics analysis in order to filter out noisy protein identifications missed in the first statistical analysis phase of Proteomics analysis.
Pathway and network analysis techniques can help address the challenge in interpreting Proteomics results. Analysis of proteomic data at the pathway level has become increasingly popular (Fig. 1). For pathway analysis, we refer to data analysis that aims to identify activated pathways or pathway modules from functional proteomic data. Biological pathways can be viewed as signaling pathways, gene regulatory pathways, and metabolic pathways, all of which are curated carefully in reputable scientific publications. Pathway analysis can help organize a long list of proteins onto a short list of pathway knowledge maps, making it easy to interpret molecular mechanisms underlying these altered proteins or their expressions (Khatri et al., 2012). For network analysis, we refer to data analysis that build, overlay, visualize, and infer protein interaction networks from functional Proteomics and other systems biology data. Network analysis usually requires the use of graph theory, information theory, or Bayesian theory. Different from pathway analysis, network analysis aims to use comprehensive network wiring diagram derived both from prior experimental sources and new in silico prediction to gain systems-level biological meanings (Wu and Chen, 2009). Many large knowledge bases on biological pathways and protein networks have been published, e.g., BioGRID (Chatr-aryamontri et al., 2013), STRING (Franceschini et al., 2013), KEGG (Kanehisa and Goto, 2000), Reactome (Matthews et al., 2009), BioCarta (Nishimura, 2001), PID (Schaefer et al., 2009), HAPPI (Chen et al., 2009), HPD (Chowbina et al., 2009), and PAGED (Huang et al., 2012) databases.
Compared to pathway and network analysis approaches applied in genomics, the advantages of the related researches in proteomics are listed below: (1) Pathway analysis for proteomic data can be directly interpreted in signaling pathways with signal proteins. (2) Network analysis for proteomic data can have direct evidences supported by protein–protein interaction data validated by in-vitro experiments. (3) Both pathway analysis and network analysis for proteomic data can be visualized in a functional protein network with transcriptional factors labeled, which are all measured indirectly in genomic studies.
Section snippets
Pathway and network analysis for proteomics
Many pathway databases and pathway analysis software tools have become available in the last decade (Khatri et al., 2012, Ramanan et al., 2012), with some directly applicable to Proteomics (Goh and Wong, 2013, Goh et al., 2012). In Proteomics, statistically significant proteins identified from high-throughput Proteomic instruments often suffer from high false discovery rate (Vitek, 2009), partly because the inherently high level of variance in Proteomics data can make it difficult to identify
Network analysis for complex protein networks
Complex protein networks are often characterized by scale-free properties (Barabási and Oltvai, 2004), i.e., their node distribution follow power laws. Such networks are highly robust to node communication errors, even with unrealistically high failure rates (Albert et al., 2000). The ability of error tolerance not only appears in complex protein networks, but also has been found in many other types of scale-free networks, such as World-Wide Web (WWW), the Internet, social networks and cell
Summary
Due to the data variability issues inherent in Proteomics measurements, statistical significance alone is insufficient to the evaluation of Proteomics results. We believe both pathway models’ functional information and topological information should be integrated to make Proteomics data interpretation relevant to biological mechanism. With the availability of two types of information, one in protein functional categories and the other in network topological features, we can categorize pathway
Acknowledgements
This work is partly supported by Indiana Center for Systems Biology and Personalized Medicine (CSBPM) and Wenzhou Medical University.
References (82)
- et al.
Pathway analysis of genomic data: concepts, methods, and prospects for future development
Trends Genet.
(2012) - et al.
Mass spectrometry-based proteomics for systems biology
Curr. Opin. Biotechnol.
(2012) - et al.
Mass spectrometry-based proteomics
Nature
(2003) - et al.
Error and attack tolerance of complex networks
Nature
(2000) - et al.
Next-generation proteomics: towards an integrative view of proteome dynamics
Nat. Rev. Genet.
(2013) - et al.
Pathguide: a pathway resource list
Nucleic Acids Res.
(2006) - et al.
Fast optimal leaf ordering for hierarchical clustering
Bioinformatics
(2001) Scale-free networks: a decade and beyond
Science
(2009)- et al.
Network biology: understanding the cell’s functional organization
Nat. Rev. Genet.
(2004) - et al.
Network medicine: a network-based approach to human disease
Nat. Rev. Genet.
(2011)
Machine learning methods for predictive proteomics
Brief. Bioinf.
Mass spectrometry-based proteomics and network biology
Annu. Rev. Biochem.
Temporal analysis of phosphotyrosine-dependent signaling networks by quantitative proteomics
Nat. Biotechnol.
Gene expression profiles of prostate cancer reveal involvement of multiple molecular pathways in the metastatic process
BMC Cancer
The BioGRID interaction database: 2013 update
Nucleic Acids Res.
HAPPI: an online database of comprehensive human annotated and predicted protein interactions
BMC Genomics
A systems biology approach to the study of cisplatin drug resistance in ovarian cancers
J. Bioinform. Comput. Biol.
HPD: an online integrated human pathway database enabling systems biology studies
BMC Bioinf.
Network-based classification of breast cancer metastasis
Mol. Syst. Biol.
Introduction to computational proteomics
PLoS Comput. Biol.
GeneSigDB—a curated database of gene expression signatures
Nucleic Acids Res.
DAVID: database for annotation, visualization, and integrated discovery
Genome Biol.
Identifying functional modules in protein–protein interaction networks: an integrated exact approach
Bioinformatics
Ant colony optimization
Encyclopedia of Machine Learning
A systems biology approach for pathway level analysis
Genome Res.
Modeling cancer progression via pathway dependencies
PLoS Comput. Biol.
Intensity-based protein identification by machine learning from a library of tandem mass spectra
Nat. Biotechnol.
STRING v9. 1: protein–protein interaction networks, with increased coverage and integration
Nucleic Acids Res.
How advancement in biological network analysis methods empowers proteomics
Proteomics
Networks in proteomics analysis of cancer
Curr. Opin. Biotechnol.
From molecular to modular cell biology
Nature
Efficient and accurate greedy search methods for mining functional modules in protein interaction networks
BMC Bioinf.
PAGED: a pathway and gene-set enrichment database to enable molecular phenotype discoveries
BMC Bioinf.
PAGED: a pathway and gene-set enrichment database to enable molecular phenotype discoveries
BMC Bioinf.
Integrated genomic and proteomic analyses of a systematically perturbed metabolic network
Science
A comprehensive two-hybrid analysis to explore the yeast protein interactome
Proc. Nat. Acad. Sci.
Semi-supervised learning for peptide identification from shotgun proteomics datasets
Nat. Methods
KEGG: kyoto encyclopedia of genes and genomes
Nucleic Acids Res.
Ten years of pathway analysis: current approaches and outstanding challenges
PLoS Comput. Biol.
A gene expression map for Caenorhabditis elegans
Science
Multiplexed protein measurement: technologies and applications of protein and antibody arrays
Nat. Rev. Drug Discovery
Cited by (84)
Interpreting omics data with pathway enrichment analysis
2023, Trends in GeneticsRole of NF-κB in lead exposure-induced activation of astrocytes based on bioinformatics analysis of hippocampal proteomics
2023, Chemico-Biological InteractionsUnravelling the neuroprotective mechanisms of carotenes in differentiated human neural cells: Biochemical and proteomic approaches
2022, Food Chemistry: Molecular SciencesUsing machine learning approaches for multi-omics data analysis: A review
2021, Biotechnology Advances