A 6-Gene Risk Signature Predicts Survival of Glioblastoma Multiforme

Background This study aims to develop novel signatures for glioblastoma multiforme (GBM). Methods GBM expression profiles from The Cancer Genome Atlas (TCGA) were downloaded and DEGs between tumor and normal samples were identified by differential expression analysis (DEA). A risk signature was developed by applying weighted gene coexpression network analysis (WGCNA) and Cox regression analysis. Patients were divided into high and low risk group, followed by evaluating the performance of the signature via Kaplan-Meier curve analysis. In addition, the prognostic significance of the signature was further validated using an independent validation dataset from Chinese Glioma Genome Atlas (CGGA). DEGs between high and low risk group were subjected to functional annotation. Results A total of 748 DEGs were identified between primary tumor and normal samples. Following WGCNA and Cox regression analysis, 6 DEGs were identified and used to construct a risk signature. The signature showed high performance in both training and validation dataset. Subsequently, 397 DEGs were identified between high and low risk group. These DEGs were mainly enriched in terms related to calcium signaling, cAMP-mediated signaling, and synaptic transmission. Conclusions The risk signature may contribute to GBM diagnosis in future clinical practice.


Introduction
As the most malignant and frequently occurring tumor of the central nervous system (CNS), glioblastoma multiforme (GBM) has been considered as a Grade IV glioma according to World Health Organization (WHO) classification [1][2][3]. Prominent features of GBM include enhanced tumor cell proliferation, migration, and invasion [4]. Recently, the prognosis of GBM has been gradually improved as a result of advances in surgical resection, radiotherapy, and adjuvant chemotherapy [2]. However, GBM remains a deadly tumor with a median survival of only 15 months [2,5].
Gene expression aberrations are universal events in cancers and may contribute to cancer development and progression [6]. For example, amplification and overexpression of epidermal growth factor receptor (EGFR) is found in more than 30% of GBM [7]. It has been shown that GBM tumor cells with EGFR amplification have higher infiltration ability and inhibition of EGFR activity suppresses tumor cell growth [8]. In addition, the expression level of inhibitor of growth family member 4 (ING ), which may inhibit tumor cell growth by suppressing nuclear factor kappa B (NF-B) signaling, has been significantly reduced in GBM [9].
Recent advancements in bioinformatics and highthroughput sequencing have led to the identification of numerous tumor biomarkers, which may allow for more accurate outcome prediction and better management of GBM [10,11]. Sreekanthreddy et al. identified serum osteopontin (OPN) as a biomarker of GBM [12]. High level of OPN was confirmed as an indicator of poor outcome of GBM [12]. Colman et al. proposed a 9-gene signature as a predictor of GBM outcome, which showed a close association with markers of glioma stem like cells, including nestin and CD13 [13]. Besides, Arimappamagan et al. established a 14-gene prognostic signature with high accuracy in distinguishing high risk GBM patients from low risk patients [14]. These 2 BioMed Research International markers may be integrated into state-of-the-art diagnosis and decision-making processes in future clinical practice. Despite these progressions, more robust prognostic predictors are still needed for GBM treatment.
In our study, we analyzed the expression data of GBM from The Cancer Genome Atlas (TCGA) and Chinese Glioma Genome Atlas (CGGA) and identified differentially expressed genes (DEGs) by differential expression analysis. Subsequently, a 6-gene signature was identified by weighted gene coexpression network analysis (WGCNA) and Cox regression analysis. The signature showed high performance in predicting GBM clinical outcome and may serve as a novel predictor of GBM outcome in future clinical practice.  [16]. The selection criteria were fold discovery rate (FDR) < 0.5 and |log 2 FC (fold change)| > 0.5. Based on the expression values of the DEGs, two-way hierarchical clustering analysis was performed using pheatmap (version 1.0.8, https://cran.rproject.org/web/packages/pheatmap/index.html) [17].

Materials and Methods
. . WGCNA. WGCNA is a bioinformatic method based on high throughput expression data, which is used for the construction of coexpression network [18]. All the expressed genes from TCGA dataset were subjected to WGCNA (version 1.61, https://cran.r-project.org/web/packages/WGCNA/) [18]. Specifically, Pearson coefficients were generated for all pairwise comparisons of genes and the resulting coexpression matrix was transformed to an adjacency matrix using soft threshold power. The soft threshold power was the value where the square of the correlation coefficient between log 2 k (k, the number of connected nodes) and log 2 p(k) (p(k), the probability of k connected nodes) reached 0.9. Subsequently, in order to group genes into different modules, the dissimilarities between genes in the adjacency matrix were generated and hierarchical clustering was performed using the dynamic hybrid tree-cutting method (cutHeight = 0.9; the minimum module size = 50).
The DEGs were then mapped to the WGCNA modules. A hypergeometric algorithm was used to calculate the value of fold enrichment [19], which was defined as where indicated the total number of genes analyzed by WGCNA, indicated the number of genes in each module, indicated the number of DEGs and indicated the number of DEGs mapped to each module. The modules with fold enrichment > 1 and p < 0.05 were selected as disease associated modules.
. . Identification of GBM Associated Gene Signature. In order to identify a GBM related signature genes, univariate and multivariate Cox regression analysis were performed using the survival package (version 2.41-1, https://cran.rproject.org/web/packages/survival/index.html) of R3.4.1. The selection criterion was log-rank p < 0.05. Based on the expression levels of the signature genes, a risk signature was then formulated as where indicated the coefficient of derived from multivariate Cox regression whereas indicated the expression level of . The risk score of each sample was calculated according to the above formula. The median of risk score value was used as the threshold to divide the training samples into high and low risk group. The prognostic significance of the risk signature was assessed by Kaplan-Meier curve analysis using the survival package. Subsequently, the prognostic significance of the risk signature was validated using the CGGA dataset with the same procedure.

. . Functional Characterization of the Different Prognosis.
DEGs between high and low risk group of the training dataset were further screened by limma package. The selection criteria for DEGs were FDR < 0.5 and |log 2 FC| > 0.5. The resulting DEGs were subjected to functional annotation analysis using the clusterProfiler package (http://bioconductor.org/packages/release/bioc/html/clusterProfiler.html) of R3.4.1 [20]. The selection criterion for GO biological processes and KEGG pathways was FDR < 0.05.

Results
. . DEGs between Tumor and Normal Samples. Following differential expression analysis, a total of 748 DEGs (218 upregulated and 530 downregulated) (Figure 1(a)) were identified between primary tumor and solid tissue normal samples. Then the specificity of the DEGs was evaluated by two-way hierarchical clustering analysis. According our results, primary tumor and solid tissue normal samples were divided into two clusters, which showed completely different overall expression patterns (Figure 1(b)).  . . Disease Related WGCNA Modules and Genes. All the expressed genes from TCGA dataset were used as input for WGCNA. The soft threshold power used for matrix transformation was determined as 18, where the square of the correlation coefficient between log 2 k and log 2 p(k) reached 0.9 and the mean connectivity of the co-expression network was 1.0 (Figures 2(a) and 2(b)). According to the WGCNA results, totally 10 different disease related modules were obtained, except for the grey module (Figure 2(c)). The correlations between WGCNA modules and disease were shown as heatmap in Figure 2(d).
A total of 745 overlap genes were obtained after mapping DEGs to WGCNA modules (Table 1)  in blue and yellow module (fold enrichment > 1 and p < 0.05), each containing 150 and 136 DEGs, respectively ( Table 1). The expression level of these DEGs in primary tumor and solid tissue normal samples were shown as heatmaps in Figure 2(e).

. . GBM Associated
Risk score of each sample was calculated based on the above formulation. The median risk score was used as the cutoff to separate samples in TCGA dataset into high and low risk group. According to the Kaplan-Meier survival curve, the prognosis of low risk group was significantly better than that of high risk group (Figure 3(a); p < 0.05). The performance of the signature in predicting prognosis was further validated in the validation dataset using the same procedure. Consistent with the result of training dataset, low risk group also showed a significantly better prognosis than high risk group in the validation dataset (Figure 3(b)). Moreover, the validation dataset also showed a high area under the receiver operating characteristic curve (AUC), close to that of the training dataset (Figure 3(c)).

. . Functional Annotation of Feature Genes.
A total of 397 DEGs (371 upregulated and 26 downregulated) between low and high risk group of the training dataset were screened by differential expression analysis (Figure 4(a)). The expression patterns of these DEGs were shown as heatmap in Figure 4(b). In order to interpret the biological functions and pathways perturbated by the signature genes, the DEGs between low and high risk group were used as inputs for GO and KEGG analysis. The enriched KEGG pathways were "hsa04080: Neuroactive ligand-receptor interaction" and "hsa04020: Calcium signaling pathway" (Figure 4(c)). The main enriched GO biological processes included "GO:0019933∼cAMP−mediated signaling" "GO:0006811∼ ion transport" "GO:0007268∼synaptic transmission" and "GO:0019226∼transmission of nerve impulse" (Figure 4(c)).

Discussion
GBM is the most malignant brain tumor, efficient management of which requires robust biomarkers [11]. In the present study, we first analyzed the expression dataset of GBM and identified DEGs between primary tumor and solid tissue normal samples. Based on the DEGs, we successfully developed a risk signature, which was efficient and reliable in predicting the clinical outcome of GBM.
Signatures composed of multiple genes are generally more robust and more accurate than single-gene biomarker in predicting tumor outcome [14]. The risk signature developed in our study consisted of 6 genes, including BPIFB , XIRP , LRRC , SDR C , HOXA , and NELL . GBM samples could be divided into high and low risk group by applying the signature. As expect, our results indicated that low risk groups showed significantly better overall survival than high risk groups in both training and validation dataset. Consequently, the 6-gene signature may provide useful information for clinical practice if incorporated into future decision-making processes.
Among the signature genes, HOXA is a Homeobox gene overexpressed in multiple cancers and has been shown to be associated with the progression of hepatocellular carcinoma, pancreatic cancer, esophageal squamous cell carcinoma and GBM [21]. It has been reported that HOXA may promote GBM progression through activation of Wnt/beta-catenin and TGF-signaling pathway, whereas downregulation of HOXA may suppress GBM cell invasion and decrease tumor growth [22]. Though few studies have investigated roles of BPIFB , XIRP and NELL in GBM, all the three genes may have potential roles in other cancers. BPIFB has been reported to be overexpressed in gastric cancer, which may result in expression alteration of epithelial-mesenchymal transition (EMT) markers [23], and XIRP has been shown to be frequently mutated in lung adenocarcinoma [24]. As a gene encoding a secreted protein regulating skeletal ossification [25], NELL has also been proposed to be a tumor suppressor gene in colon cancers [26]. Therefore, dysregulation of the BPIFB , XIRP and NELL may also underly GBM tumorigenesis. While the roles of the remaining two signature genes in GBM tumorigenesis remains unclear, both genes have important physiological roles. LRRC is required for early heart development [27,28] and SDR C encodes a retinol dehydrogenase, which may be essential for retinoic acid biosynthesis [29]. Considering their important physiological roles, they may also potentially be involved in GBM tumorigenesis.
Dysregulation of cancer related pathways and functions is common in cancers [30]. Our functional annotation analysis showed that DEGs between high and low risk group were significantly enriched in pathways and functions related to synaptic transmission, indicating that synaptic function in GBM patients may be deregulated. In addition, DEGs were also enriched in calcium signaling and cAMP-mediated signaling. Ca 2+ is an essential regulator for neurogenesis and synaptic transmission, and the deregulation of Ca 2+ signaling may advance GBM progression [31,32]. It has been proposed that manipulating Ca 2+ signaling may benefit the management of GBM [31]. Suppression of cAMP signaling pathway has been shown to be a common feature in tumorigenesis and activation of cAMP signaling in GBM may inhibit tumor cell growth and induce cell apoptosis [33]. Considering the prognostic differences between high and low risk group, we speculated that deregulation of calcium signaling and cAMP-mediated signaling may play important roles in the development and progression of GBM.
One main advantage of our study was that a robust GBM risk signature was developed through a combination of WGCNA and Cox regression analysis. The signature was efficient and reliable for both training and validation dataset when applied in outcome prediction. In addition, we also identified BPIFB , XIRP , NELL , LRRC , and SDR C as novel GBM biomarkers, as they have never been reported to be associated with GBM development and progression. However, we also noticed some limitations in our study. The samples included in our study was insufficient and more samples are required in future studies. Additionally, experimental studies should be designed and performed to confirm the involvement of the novel biomarkers in GBM and to provide an insight into corresponding molecular mechanisms.

Conclusion
In conclusion, we analyzed the expression profiles of GBM and identified DEGs between primary tumor and solid tissue normal samples. A 6-gene risk signature consisting of BPIFB , XIRP , LRRC , SDR C , HOXA , and NELL was further developed for outcome prediction. The signature may contribute to future decision-making processes in clinical practice.

Data Availability
No data were used to support this study.

Conflicts of Interest
The authors declare that they have no conflicts of interest.