Identifying of biomarkers associated with gastric cancer based on 11 topological analysis methods of CytoHubba

Ma, Hua; He, Zhihui; Chen, Jing; Zhang, Xu; Song, Pingping

doi:10.1038/s41598-020-79235-9

Download PDF

Article
Open access
Published: 14 January 2021

Identifying of biomarkers associated with gastric cancer based on 11 topological analysis methods of CytoHubba

Hua Ma¹,
Zhihui He²,
Jing Chen³,
Xu Zhang¹ &
…
Pingping Song¹

Scientific Reports volume 11, Article number: 1331 (2021) Cite this article

5188 Accesses
32 Citations
Metrics details

Subjects

Abstract

Gastric cancer (GC) is one of the most common types of malignancy. Its potential molecular mechanism has not been clarified. In this study, we aimed to explore potential biomarkers and prognosis-related hub genes associated with GC. The gene chip dataset GSE79973 was downloaded from the GEO datasets and limma package was used to identify the differentially expressed genes (DEGs). A total of 1269 up-regulated and 330 down-regulated genes were identified. The protein-protein interactions (PPI) network of DEGs was constructed by STRING V11 database, and 11 hub genes were selected through intersection of 11 topological analysis methods of CytoHubba in Cytoscape plug-in. All the 11 selected hub genes were found in the module with the highest score from PPI network of all DEGs by the molecular complex detection (MCODE) clustering algorithm. In order to explore the role of the 11 hub genes, we performed GO function and KEGG pathway analysis for them and found that the genes were enriched in a variety of functions and pathways among which cellular senescence, cell cycle, viral carcinogenesis and p53 signaling pathway were the most associated with GC. Kaplan-Meier analysis revealed that 10 out of the 11 hub genes were related to the overall survival of GC patients. Further, seven of the 11 selected hub genes were verified significantly correlated with GC by uni- or multivariable Cox model and LASSO regression analysis including C3, CDK1, FN1, CCNB1, CDC20, BUB1B and MAD2L1. C3, CDK1, FN1, CCNB1, CDC20, BUB1B and MAD2L1 may serve as potential prognostic biomarkers and therapeutic targets for GC.

Combined bioinformatics technology to explore pivot genes and related clinical prognosis in the development of gastric cancer

Article Open access 29 July 2021

FN1, SPARC, and SERPINE1 are highly expressed and significantly related to a poor prognosis of gastric adenocarcinoma revealed by microarray and bioinformatics

Article Open access 24 May 2019

An in-silico method leads to recognition of hub genes and crucial pathways in survival of patients with breast cancer

Article Open access 30 October 2020

Introduction

Gastric cancer (GC), a common heterogeneous disease, is one of the most deadly malignancies worldwide, especially in East Asia¹. Previously, many patients missed the optimal diagnosis and treatment time, which leads to tumor cell metastasis and progression to advanced cancer. Currently, there is a lack of effective biomarkers for early diagnosis. Comprehensive treatment and cancer surveillance have been identified as one of the major obstacles to improve the prognosis of gastric cancer². Therefore, a deeper understanding of the mechanisms involved in GC progression and identification of potential biomarkers and targets for the diagnosis, prognosis and therapy of GC are urgently needed.

Bioinformatics analysis methods, are powerful tools for identifying potential biomarkers related to diagnosis and treatment, including the analysis of gene interaction networks, gene annotation and microarray expression profiles³. For example, Hao et al. explored 10 genes (COL1A1, COL3A1, COL1A2, COL5A2, FN1, THBS1, COL5A1, SPARC, COL18A1 and COL11A1) as potential biomarkers and therapeutic targets for GC, through analysis data from the Gene Expression Omnibus (GEO) database⁴. Furthermore, Zhu et al. found that CDK1 overexpression was a prognostic factor for hepatocellular carcinoma (HCC), which makes it a potential therapeutic target and biomarker for HCC diagnosis, through analysis data from GEO and the Cancer Genome Atlas (TCGA)⁵. Liao et al. identified two genes (SERPINE1 and SPARC) as potential biomarkers and therapeutic targets for GC⁶. However, the methods of hub gene selection in the above literatures was single and the potential molecular mechanism of gastric cancer was still unclear, which needs further exploration.

In this study, DEGs between human GC tissues and normal tissues were identified using Limma based on GEO datasets. And a directed acyclic graph was constructed with Bingo plug-in to view the overall enrichment of DEGs. Next we constructed a PPI network of DEGs based on the STRING V11 database and visualized it using Cytoscape software. And then 11 topological analysis methods were adopted to select the hub genes. Important module in the network related to the hub genes were abstracted by MCODE. In addition, in order to explore the role of the 11 selected hub genes in the pathogenesis of GC, we performed GO function and KEGG pathway enrichment analysis for the genes. Finally, Kaplan-Meier analysis was performed to evaluate the prognostic value of these hub genes. And Cox model and LASSO regression analysis were used to verify these hub genes further.

Results

Data preprocessing

It can be seen from the weight and residual symbol maps (Fig. 1a–b) that all the points were evenly distributed. In the relative logarithmic expression graph (Fig. 1c), all the samples were near the zero point without outliers. For the RNA degradation diagram (Fig. 1d), generally we need 5′-terminal lower than 3′-terminal⁷. Our results showed that the data was in good quality and suitable for downstream analysis.

Differentially expressed genes and enrichment analysis

Though Limma method, a total of 1599 DEGs were identified in the dataset GSE79973, of which 1269 were up-regulated and 330 were down-regulated (Fig. 2). Cytoscape’s plug-in Bingo generated a directed acyclic graph (Fig. 3), in which branches represented inclusion relationships. The range of functions defined by the arrow direction from top to bottom was getting smaller and smaller, the deeper color the higher degree of enrichment. The graph was divided into three parts, representing BP, MF and CC. These genes were thought to be involved in the regulation of RNA metabolic processes, positive and negative regulation of phosphorous metabolic processes in the adaptive immune response based on somatic cells, extracellular matrix tissue and etc.

PPI network and GC-associated clustering module construction

Based on 1599 DEGs, a PPI network was constructed where 1376 genes formed the network with 14,394 edges (Fig. 4a). And hub genes were identified by 11 topological analysis methods, where the top 20 genes were selected for each method (Supplementary file 1), among which C3, CDK1, AURKA, CDC20, CCNA2, AURKB, CCNB1, BUB1B, MAD2L1, UBE2C and FN1 were found in the intersection of at least five methods and were selected as GC related hub genes. We also obtained the clustering module with the highest score from PPI network of all DEGs (Fig. 4b) by MCODE algorithm. It was found that all the 11 hub genes were contained in this module.

GO and KEGG functional enrichment analysis for the selected hub genes

In order to explore the role of the 11 selected hub genes, we performed GO function and KEGG pathway analysis for them. The results of GO function enrichment indicated GO terms for 187 biological processes (BP), 29 cell components (CC) and 7 molecular functions (MF), see Supplementary file 2. Figure 5 showed the top 10 terms for BP, CC and all the terms for MF and KEGG. Among these top terms, we found 6 terms associated with GC and some similar cancers. These terms were mitotic spindle checkpoint, anaphase-promoting complex, cell cycle, viral carcinogenesis, cellular senescence and p53 signaling pathway. Kim et al. found that frequent mutations of human Mad2, but not Bub1, in gastric cancers cause defective mitotic spindle checkpoint⁸. Zhu et al. found that the miR-383 inhibited the cell cycle progression of gastric cancer cells via targeting cyclin E2⁹. Uozaki et al. studied that gastric cancer and viral carcinogenesis through epigenetic mechanism¹⁰. Ji et al. found that microRNA miR-34 was a direct target of p53, which played an anti-cancer role in the downstream of p53 pathway¹¹. By modulating the cellular senescence through E2F/miR-106b-5p/p21 axis, Dong et al. found a novel mechanism by which BRD4 regulated cancer cell proliferation and provided new insights into using BET inhibitors as potential anticancer drugs¹². Dai et al. found that activation of anaphase-promoting complex by p53 induced a state of dormancy in cancer cells against chemotherapeutic stress¹³.

Survival analysis

Using the Kaplan–Meier plotter database, the prognostic value of the 11 hub genes were evaluated in GC patients. All the 11 hub genes were up-regulated (\(log_{2} fold change (FC)>1\), Table 1). It was found that 10 out of the 11 hub genes including C3, CDK1, AURKA, CCNA2, AURKB, CCNB1, BUB1B, MAD2L1, FN1 and UBE2C (p-value < 0.05) had significant difference of overall survival between high and low expression. The results showed that the survival rate of high expression C3, AURKB, FN1 and UBE2C groups were significantly lower than that of low expression groups, and the survival rate of low expression CDK1, AURKA, CCNA2, CCNB1, BUB1B, MAD2L1 groups were significantly lower than that of high expression (Fig. 6).

Table 1 Regulation of 11 hub genes.

Full size table

Verification based on another dataset

Based on dataset GSE19826, Limma identified 333 DEGs. These DEGs were imported into the STRING V11 database to obtain a TSV file of protein interactions. And hub genes were identified by 11 topological analysis methods, where the top 20 genes were selected for each method (Supplementary file 3), among which FN1, CDK1, MMP9, CCNB1, AURKA, UBE2C, AURKB, CCNA2, FOXM1, ITGB1, EZH2, RRM2, THBS1, CXCL8, CDC20, COL1A1, BUB1B, MAD2L1, HIF1A and CDH2 were found in the intersection of at least five methods, including 10 out of the 11 hub genes in the above conclusion, except C3. Important clustering module related to the 11 hub genes were abstracted by CytoHubba which had 133 nodes and 1077 edges (Supplementary Fig. 1a). We obtained two clustering modules with the highest score from PPI network of all DEGs (Supplementary Fig. 1b,c) by MCODE algorithm. It was found that all the 11 hub genes were contained in this module.

Cox analysis and LASSO regression analysis

The analysis results of correlation between DEGs expression and overall survival (OS) as well as other clinical features investigated by Cox analysis were shown in Table 2. The results suggested that six clinical features including age, stage, grade, T, M and N and six hub genes including C3, CDK1, FN1, CCNB1, CDC20 and BUB1B were revealed significantly correlated with OS (p-value < 0.05) by univariate or multivariate Cox analysis. The LASSO method established regression model and continued to screen the 11 hub genes. By setting different \(\lambda\), the path change graph of the regression coefficient was obtained (Fig. 7a). The trend of each curve in the figure represented the change of the regression coefficient path. It could be seen that the regression coefficients were mostly compressed to zero, which showed that the model had a good advantage in dimensionality reduction and variable selection. Each point in Fig. 7b corresponded to a penalty value, and the position of the vertical dashed line represented the number of genes selected under the optimal model. It could be seen from the figure that there were five genes under the optimal model, including C3, CCNB1, CDC20, FN1 and MAD2L1. Therefore, seven of the 11 selected hub genes including C3, CDK1, FN1, CCNB1, CDC20, BUB1B and MAD2L1 were verified through Cox analysis or LASSO regression analysis which could be taken as independent prognostic biomarkers for GC.

Table 2 Univariate and multivariate Cox analysis.

Full size table

Discussion

The study of molecular genetics and signal transduction pathways are helpful for further understanding of the pathogenesis and early diagnosis of GC. Therefore, recognition of DEGs for GC based on transcriptome microarray datasets may contribute to early diagnosis and develop effective therapies.

In our study, a total of 1599 DEGs were identified in the dataset GSE79973, of which 1269 were up-regulated and 330 were down-regulated. Based on these DEGs, a PPI network was constructed where 1376 genes formed the network with 14,394 edges. And 11 hub genes were selected through intersection of 11 topological analysis methods. Clustering module related to these 11 genes were obtained by MCODE. In order to explore the role of the 11 selected hub genes in the pathogenesis of GC, we performed GO function and KEGG pathway enrichment analysis for them and found that the genes were enriched in a variety of functions and pathways. Kaplan–Meier analysis revealed that 10 out of the 11 hub genes were related to the overall survival of GC patients. Cox analysis and LASSO regression analysis showed that seven of the 11 selected hub genes were significantly correlated with GC.

The overall aim of this study was to identify the hub genes which may serve as potential biomarkers for GC diagnosis and therapy, and to further explore the potential mechanisms of GC by integrated profiling analysis. In our study, seven of the 11 selected hub genes including C3, CDK1, CCNB1, BUB1B, MAD2L1, FN1 and CDC20 were considered to be the most likely independent prognostic biomarkers associated with GC. And five of them were newly found including C3, CDC20, CCNB1, BUB1B and MAD2L1 compared with previously published results using the same dataset. Some relevant literatures suggested from biological point of view that most of these found hub genes played important roles on GC. Kitano et al. showed that the synthesis and secretion of C3 by all the tested GC derived cell lines in response to TNF, suggested that C3 may be secreted in the gastric wall as part of its normal physiology, or as a result of tumour pathology and thereby participate in local immune or inflammatory responses¹⁴. Lee et al. found that the high expression of CDK1 in GC patients may imply a strong biological ability of tumor invasion and CDK1 was the target gene of mir-490-5p. Down-regulation of mir-490-5p and up-regulation of CDK1 can promote the proliferation ability of GC cells and the transformation of G1/S phase¹⁵. A study by Kidokoro et al. found that CDC20 was often up-regulated in many types of tumors and significantly inhibited by ectopic introduction of p53. Additionally, treatment of cancer cells with siRNA against CDC20 can induce G2/M arrest and inhibit cell growth¹⁶. CCNB1 knockdown by RNA interference was found to significantly inhibited proliferation, migration and invasion of HCC cells¹⁷. Hudler et al. found that the expression of BUB1B in GC tissues was significantly higher than that in adjacent normal tissues, nearly \((8.875 \pm 1.08)\) times¹⁸. Frio et al. found that at least one BUB1B mutation can result in autosomal recessively inherited susceptibility to gastrointestinal cancer, as do mutations in MUTYH and the mismatch-repair genes¹⁹. FN1 was an significant regulatory factor promoting the development and formation of various cancer cells, such as laryngeal, skin squamous carcinoma^20,21 and brain glioblastoma²². Zhang et al. found that miR-200c can inhibit the migration, proliferation and invasion of GC cells in vitro by directly combining with FN1, which indicated that mir-200c and FN1 may be potential biomarkers or therapeutic methods for GC²³. A study by Wang et al. confirmed the prognostic value of two key mitotic checkpoint genes MAD2L1 and BUB1, which have been included in multiple gene expression signatures for breast cancer prognosis. And they also found that these genes were biologically relevant to breast cancer progression, as suppression of their expression was associated with reduced tumor cell growth, migration and invasion²⁴.

This study provides important clues for exploring potential biomarkers and targets for the diagnosis, prognosis and treatment of GC. In future work, if condition permits, we hope to conduct some experiments to verify the important relation between these hub genes and GC from biological point of view.

Conclusion

Through 11 topological analysis methods, we identified 11 hub genes for GC. We validated these hub genes through functional enrichment analysis, the clustering module with the highest score, relevant literatures, Kaplan–Meier analysis, Cox analysis and LASSO regression analysis. The results suggested that seven of the 11 selected hub genes including C3, CDK1, CCNB1, BUB1B, MAD2L1, FN1 and CDC20 may serve as potential prognostic biomarkers and therapeutic targets for GC. These results may provide a theoretical direction for future research with regards to the molecular mechanisms of the progression of GC.

Materials and methods

Dataset

We collected the set of gene expression profiles of GC from the Gene Expression Omnibus database (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE79973). This dataset includes 10 GC samples and 10 normal gastric samples. The platform was GPL570 (Affymetrix Human Genome U133 Plus 2.0). This dataset was used by previous studies^4,6,25,26,27, where the authors mainly clarified the emerging role of long non-coding RNA (lncRNA) in cancer development, explored novel lncRNA candidates, identified key candidate genes and circRNA, and explored the molecular mechanism of GC through comprehensive analysis of mRNA and miRNA expression profiles. Here we aimed to identify potential prognostic biomarkers of GC based on 11 topological analysis methods and used the MCODE method and survival analysis to verify these hub genes.