Research articleUnderstanding the mechanisms of cancers based on function sub-pathways
Introduction
With the development of high-throughput technology, it still remains a major challenge to gain biologically meaningful information from the increasingly accumulated data. One way is to detect the differentially expressed genes (DEGs) between two conditions and analyze the common “themes” among these genes. For example, the approach to map DEGs on to Gene Ontology terms is the most widely used which are always available in the on-line GO analysis tools (Draghici et al., 2003; Draghici, 2002; Khatri and Draghici, 2005). The disadvantage of these methods is that they focused only on DEGs without considering the interactions between genes, and therefore may produce some false positives because of data noising. The most used pathway datasets including: Kyoto Encyclopedia of Genes and Genomes (KEGG) (Kanehisa and Goto, 2000), Reactome (Joshi-Tope et al., 2005) etc.
Pathway (or functional group) analysis of DEGs goes through four stages. Over representation approach(ORA) compares the number of differential genes expected to hit a given pathway by chance(Khatri et al., 2012), such as Onto-Express (Khatri et al., 2002) or GOEASE (Zheng and Wang, 2008). Functional class scoring (FCS) detects coordinated changes in the expression of genes in a pathway. Gene-set enrichment analysis (GSEA) is a classical FCS method (Subramanian et al., 2005; Mootha et al., 2003). Pathway-topology-based (PT-based) approach, such as signaling-pathway impact analysis (SPIA) (Tarca et al., 2009) and ScorePAGE (Rahnenfuhrer et al., 2004), takes into account the relationships among the constituent gene in a pathway. As more and more evidence indicate that the abnormalities in some local regions of pathways, named sub-pathways, may contribute to the etiology of diseases (Li et al., 2012, 2009; Li et al., 2011), a lot of sub-pathway analysis have been proposed to identify disease related pathways, such as k-clique (Li et al., 2009), DEgraph (Jacob and Dudoit, 2012), the clipper approach(Martini et al., 2013), Pathiways (Sebastian-Leon et al., 2014), and sub_SPIA (Li et al., 2015).
Obviously, how to define a sub-pathway is very critical for sub-pathway analysis. In previous studies, it was defined as a homogeneous sub graph in DEgraph (Jacob and Dudoit, 2012), a linear subpath of a junction tree in clipper (Martini et al., 2013), a linear path of a pathway in Pathiways, a k-clique in SubpathwayMiner (Li et al., 2009), or a minimum spanning tree(MST) in sub_SPIA (Sebastian-Leon et al., 2014). Although these methods could identify some potential cancer related pathways, the sub-pathways identified by them generally have no explicit functions. In KEGG pathways, there are lots of downstream nodes representing some specific biological function. For example, the apoptosis signaling pathway can lead to opposite functions: apoptosis and degradation (cell death) or anti-apoptosis (cell survival). Obviously, the differential expression of the upstream genes will interrupt the normal signal transmitting and then result in the change of cells’ behavior. In 2014, Sebastian-Leon et al. proposed an input-output circuit as a sub-pathway and studied its probability of activation (or inhibition) based on the gene expression (Sebastian-Leon et al., 2014). The activation (or inhibition) of a specific function can be used to understand the mechanisms of diseases. However, the upstream inputs of a downstream function usually are generally multiple instead of only one. And considering multiple inputs-output will make the computation of the probability of activation (or inhibition) of the function node very complicated.
In this paper, we propose to define a sub-pathway as a set of nodes which transmit upstream signal to a specific functional node, and we name this method as a function sub-pathway analysis (FSPA). The significance of the sub-pathway based on the number of DEGs is calculated by the hypergeometric test. Generally, the more DEGs in a sub-pathway, the more significant it is. Therefore, the significance actually reflects as the degree of perturbation of the sub-pathway by the DEGs. We applied the proposed FSPA to analyze 6 datasets of three cancers: colorectal cancer, pancreatic cancer and lung cancer. Compared with ORA, GSEA, SPIA and k-clique, our results show that FSPA can not only identify more cancer related pathways, but also perform consistently in two independently datasets of a cancer. More importantly, the abnormal sub-pathways could be used to further analyze the mechanisms of different cancer based on biological functional level.
Section snippets
Results
The p-values of a sub-pathway by FSPA, ORA and k-clique are computed by hypergeometric test. The p-values of SPIA and GSEA are directly calculated by the R packages developed by Tarca et al. using SPIA (Tarca et al., 2009) and EnrichmentBrowser (Geistlinger et al., 2016) R package. The FDR-adjusted p-values of SPIA, ORA, FSPA, and k-clique are computed from the nominal p-values by using the R function “p.adjust”. For FSPA and k-clique, the p-value of a pathway is assigned as the minimal
Discussion
Understanding the mechanisms of cancers based on differentially expressed genes is a very important for identifying potential biomarkers and designing proper therapeutic strategies. Lots of works have been proposed to analysis the functions of genes at the pathway level. The main problem of most of the sub-pathways identified by previous method is that they have no explicit biological meanings and this hinders the application to practical situations.
Recently, Sebastian-Leon et al. proposed an
Data
Six datasets are analyzed: two colorectal cancer data sets, two lung cancer data sets, and a pancreatic cancer dataset (see Table 2). The microarray platform of these 6 data sets is Affymetrix HG-U133 Plus 2.0. Differences in gene expression between the two sample groups are calculated using established function from the limma package(Smyth, 2005). We identify the DEGs from the 6 gene expression profiles described above, respectively. The thresholds of DEGs are set as p-value (p.adjust) < 0.05
Methods
Authors’ contributions
W.L. and P.X. designed the methods. B.Z analyzed the data. W.L. and Z.B. wrote the manuscript.
Competing interests
The authors declare that no conflict of interests exist regarding the publication of this article.
Acknowledgments
This work was funded in part by the National Science Foundation of China (grant nos. 61572367, 61,272,018 and 61,573,017)
References (32)
Global functional profiling of gene expression
Genomics
(2003)- et al.
Association between atherosclerosis and female lung cancer—a Danish cohort study
Lung Cancer
(2003) Profiling gene expression using onto-express
Genomics
(2002)FKBP51 affects cancer cell response to chemotherapy by negatively regulating Akt
Cancer Cell.
(2009)Combined gene expression analysis of whole-tissue and microdissected pancreatic ductal adenocarcinoma identifies genes specifically overexpressed in tumor epithelia
Hepatogastroenterology
(2008)Statistical intelligence: effective analysis of high-density microarray data
Drug Discov. Today
(2002)Inflammation, adenoma and cancer: objective classification of colon biopsy specimens with gene expression signature
Dis. Mark.
(2008)- et al.
Bioconductor's EnrichmentBrowser: seamless navigation through combined results of set- & network-based enrichment analysis
BMC Bioinform.
(2016) Evaluation of microarray preprocessing algorithms based on concordance with RT-PCR in clinical samples
PLoS One
(2009)A susceptibility gene set for early onset colorectal cancer that integrates diverse signaling pathways: implication for tumorigenesis
Clin Cancer Res.
(2007)
Gene expression-based classification of non-small cell lung carcinomas and survival prediction
PLoS One
More power via graph-structured tests for differential expression of gene networks
Ann. Appl. Stat.
Reactome: a knowledgebase of biological pathways
Nucleic Acids Res.
KEGG: kyoto encyclopedia of genes and genomes
Nucleic Acids Res.
Ontological analysis of gene expression data: current tools, limitations, and open problems
Bioinformatics
Ten years of pathway analysis: current approaches and outstanding challenges
PLoS Comput. Biol.
Cited by (6)
PmiRtarbase: A positive miRNA-target regulations database
2022, Computational Biology and ChemistryCitation Excerpt :Fig. 3(B) shows the top 20 enriched signaling pathways of the 131 genes involved in positive miRNA-target interactions for human. Most of the top 20 pathways are related to cancer such as pathways in cancer, wnt signaling pathway and PI3K-Akt signaling pathway (Liu et al., 2019; Xu et al., 2016, 2018). These observations reveal that positive miRNA-target interactions also play important roles in cancer development.
A systematic study of critical miRNAs on cells proliferation and apoptosis by the shortest path
2020, BMC BioinformaticsGwSPIA: Improved Signaling Pathway Impact Analysis with Gene Weights
2019, IEEE Access