Construction of a Prognostic Gene Signature Associated with Immune Infiltration in Glioma: A Comprehensive Analysis Based on the CGGA

Background Tumor microenvironment (TME) is closely related to the progression of glioma and the therapeutic effect of drugs on this cancer. The aim of this study was to develop a signature associated with the tumor immune microenvironment using machine learning. Methods We downloaded the transcriptomic and clinical data of glioma patients from the Chinese Glioma Genome Atlas (CGGA) databases (mRNAseq_693). The single-sample Gene Set Enrichment Analysis (ssGSEA) database was used to quantify the relative abundance of immune cells. We divided patients into two different infiltration groups via unsupervised clustering analysis of immune cells and then selected differentially expressed genes (DEGs) between the two groups. Survival-related genes were determined using Cox regression analysis. We next randomly divided patients into a training set and a testing set at a ratio of 7 : 3. By integrating the DEGs into least absolute shrinkage and selection operator (LASSO) regression analysis in the training set, we were able to construct a 15-gene signature, which was validated in the testing and total sets. We further validated the signature in the mRNAseq_325 dataset of CGGA. Results We identified 74 DEGs associated with tumor immune infiltration, 70 of which were significantly associated with overall survival (OS). An immune-related gene signature was established, consisting of 15 key genes: adenosine triphosphate (ATP)-binding cassette subfamily C member 3 (ABCC3), collagen type IV alpha 1 chain (COL4A1), podoplanin (PDPN), annexin A1 (ANXA1), COL4A2, insulin-like growth factor binding protein 2 (IGFBP2), serpin family A member 3 (SERPINA3), CXXC-type zinc finger protein 11 (CXXC11), junctophilin 3 (JPH3), secretogranin III (SCG3), secreted protein acidic and rich in cysteine (SPARC)-related modular calcium-binding protein 1 (SMOC1), Cluster of Differentiation 14 (CD14), COL1A1, S100 calcium-binding protein A4 (S100A4), and transforming growth factor beta 1 (TGF-β1). The OS of patients in the high-risk group was worse than that of patients in the low-risk group. GSEA showed that interleukin-6 (IL-6)/Janus kinase (JAK)/signal transducer and activator of transcription (STAT3) signaling, interferon gamma (IFN-γ) response, angiogenesis, and coagulation were more highly enriched in the high-risk group and that oxidative phosphorylation was more highly enriched in the low-risk group. Conclusion We constructed a stable gene signature associated with immune infiltration to predict the survival rates of glioma patients.


Introduction
Glioma is the most common type of primary brain tumor. As it is highly infiltrative and aggressive, survival time is short, even after combined treatment with surgery, radiotherapy, and chemotherapy. e World Health Organization (WHO) classification system divides glioma into low-grade (I-II) and high-grade (III-IV) based on the absence or presence of anaplastic features and histological characteristics with molecular features, including isocitrate dehydrogenase 1 (IDH1) mutation and 1p/19q codeletion [1,2]. Despite all that, prognostic prediction and improvement of treatment efficiency in glioma remain challenging. erefore, identifying novel prognostic biomarkers and therapeutic targets is imperative in this cancer.
Recent research has shown that immune infiltration plays a crucial role in cancer tumorigenesis and progression [3]. Tumor immune infiltration involves multiple immune cells whose functions are significantly altered in glioma. e research of Tian et al. notes that M0 and M2 tumor-associated macrophages (TAMs) play a protumor role, while M1 TAMs play an antitumor role [4]. e glioblastoma microenvironment has been demonstrated to increase counts of myeloid-derived suppressor cells, which block T cell and natural-killer (NK) cell functions, resulting in an immunocompromised microenvironment [5]. By secreting cytokines such as transforming growth factor beta (TGF-β), the glioma microenvironment favors recruitment and survival of regulatory T cells (Tregs), which are associated with worse prognosis in many cancers [6][7][8]. erefore, targeting these cells in treatment is expected to shift the glioma tumor microenvironment (TME) from a "cold" to a "hot" phenotype [9].
With the rapid development of transcriptome sequencing, analytical methods or databases such as CIBERSORT, single-sample Gene Set Enrichment Analysis (ssGSEA), ESTIMATE algorithm, and Tumor Immune Estimation Resource (TIMER) have been developed to evaluate tumor immune infiltration [10][11][12][13]. Signatures based on immune infiltration also show promising predictive value in tumor prognosis and immunotherapeutic response. Most studies on immune-related prognostic biomarkers in glioma have focused on tumor-infiltrating cells. However, the definition of immune cell type depends on protein molecular-weight markers; immune-related biomarkers rather than immune cells constitute another way to view the TME. Tian et al. constructed a classifier of immune-relevant long noncoding ribonucleic acids (lncRNAs) using immune gene sets from the Molecular Signatures Database (MSigDB) to predict glioma prognosis [14]. Based on the ESTIMATE algorithm, Zhang et al. constructed a four-gene signature involved in immune infiltrssation in glioblastoma [15]. Xiao et al. provided an immune-related gene model for risk stratification in lowergrade glioma using CIBERSORT and TIMER [16]. To date, no signature associated with immune infiltration in glioma based on ssGSEA has been explored.
In this study, we conducted a comprehensive analysis of the immune infiltration landscape based on a dataset from the Chinese Glioma Genome Atlas (CGGA) databases and established a predictive model. Our results demonstrated that the risk score of the signature was an independent prognostic factor for glioma patients.

Data Acquisition.
We drew the transcriptomic and clinical data of glioma patients from the public CGGA (http://www.cgga.org.cn/) databases mRNAseq_693 and mRNAseq_325 [17,18]. Patients with complete overall survival (OS) information and gene expression profiles were included in our study. We used fragments per kilobases per million reads (FPKM) to estimate RNA expression.

Abundance of Immune Cells.
In this study, we applied ssGSEA to quantify the relative abundance of 28 immune cells based on gene sets obtained from the TISIDB database, an integrated repository portal for tumor immune system interactions [12]. We evaluated 28 immune cell types, including monocytes, central-memory Cluster of Differentiation 4 (CD4) T cells, plasmacytoid dendritic cells, immature dendritic cells, activated dendritic cells, CD56 dim natural-killer (NK) cells, CD56 bright NK cells, cδ T cells, NK cells, myeloid-derived suppressor cells, central-memory CD8 T cells, effector memory CD4 Tcells, Tfollicular helper (T fh ) cells, effector memory CD8 T cells, macrophage, activated CD8 T cells, NK T cells, type 1T helper (T h 1) cells, regulatory T cells (Tregs), T h 2 cells, memory B cells, immature B cells, T h 17 cells, mast cells, activated CD4 T cells, eosinophils, activated B cells, and neutrophils. Unsupervised classification of the glioma cohort was conducted through k-means clustering on ssGSEA enrichment score [19,20]. We used the R software (https://www.r-project.org) package ConsensusClusterPlus to implement an unsupervised consensus approach and divided glioma patients into two immune-infiltrating subtypes according to infiltrating densities of immune cells [21].

Differentially Expressed lncRNAs and mRNAs.
To identify lncRNAs and messenger RNAs (mRNAs) associated with immune infiltration, we compared gene expression between different samples using the ballgown method [22]. An adjusted P value of <0.05 and log2 fold change (| log2FC|) > 1 were set as the cutoff criteria for differentially expressed lncRNAs, while P < 0.05 and |log2FC| > 2 were set as the cutoff criteria for differentially expressed mRNAs. Unsupervised classification of the glioma cohort was conducted via k-means clustering on the expression of differentially expressed mRNA.

GO and KEGG Pathway Analysis.
To analyze the possible functions of these differentially expressed mRNAs, we used the R package clusterProfiler to conduct Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses [23,24]. GO enrichment was carried out from the following three aspects: biological processes (BP), cellular components (CC), and molecular functions (MF). KEGG enrichment was mainly focused on molecular mechanisms and metabolic pathways. e top 15 items of GO categories and KEGG pathways are shown in the bubble maps.

Univariate Cox and LASSO Regression Analyses.
To obtain immune-and survival-related genes, we analyzed differentially expressed mRNAs via univariate Cox regression using the R package survival. Candidate mRNAs were selected if P < 0.01. Using R software, we randomly divided CGGA glioma patients into a training set and a testing set at a ratio of 7 : 3. e abovementioned immune-    and survival-related genes were integrated into least absolute shrinkage and selection operator (LASSO) regression analysis in the training set as calculated by the R package glmnet [25]. e signature to calculate the risk score of each patient equaled the sum of the respective LASSO coefficients multiplied by mRNA expression level. We then validated the signature in the testing and total sets. Next, all patients were split into low-risk and high-risk groups according to median risk score.
2.6. GSEA. Based on hallmark gene sets, we used GSEA to identify differentially regulated pathways between the low-risk and high-risk groups [26,27]. Enriched pathways with a normal P value of <0.05 and false discovery rate (FDR) Q-value of <0.25 were considered statistically significant.

Statistical Analyses.
For comparisons between two groups, we used Student's t-test and Wilcoxon test. For comparisons among three groups, we used one-way ANOVA and Kruskal-Wallis test. Spearman's correlation coefficient (SCC) was used to calculate coefficients between two parameters and Fisher's exact test to estimate correlations between mRNAs and the terms of GO and KEGG analyses. We used the univariate Cox model to determine independent prognostic factors. Receiver operating characteristic (ROC) analysis and area under the curve (AUC) were used to evaluate the specificity and sensitivity of the signature. e survival difference between the two groups was demonstrated by Kaplan-Meier (KM) curves.

e Landscape of Tumor-Infiltrating Immune Cells in
Glioma. We analyzed enrichment levels of 28 immune cells in glioma using immune-associated gene sets in ssGSEA.
e comprehensive immune landscape of the glioma cohort is depicted in the form of a heatmap (Figure 1(a)). Results revealed that densities of monocytes, CD56 dim NK cells, plasmacytoid dendritic cells, immature dendritic cells, and central-memory CD4 T cells were high. In contrast, densities of eosinophils, neutrophils, and activated B cells were low. Figure 1(b) indicates the correlation coefficient among the 28 immune cell types. We can find some highly correlated immune cell couples in this figure, such as T h 2 cells/CD56 NK cells, monocyte/memory B cells and Tregs/activated dendritic cells. We clustered glioma samples according to the ssGSEA scores of these immune-associated gene sets and divided them into two distinct clusters: high_infiltration and low_infiltration (Figures 1(c)-1(d)). Results confirmed that patients with high-infiltration TMEs had worse clinical outcomes than those with low-infiltration TMEs (Figure 1(e)). ese data also proved the importance of subgrouping as a basis for immunogenomic profiling.

Identification of Differentially Expressed mRNAs.
By comparing differentially expressed mRNAs between the low-and high-infiltration groups, we obtained 50 immunerelated mRNAs that were upregulated in the high-infiltration group and 24 that were upregulated in the low-infiltration group (Table 2; Figure 3(a)). Neuron-specific gene 2 (NSG2), glutamate ionotropic receptor N-methyl-D-aspartate type subunit 1 (GRIN1), complexin-2 (CPLX2), chromogranin A (CHGA), and synaptosomal-associated protein, 25 kDa (SNAP25), were the top five in the low-infiltration group, while chitinase-3-like proteins 1 and 2 (CHI3L1, CHI3L2), Spen paralog and ortholog C-terminal domain containing 1 (SPOCD1), lactoferrin (LTF), and serpin family A member 3 (SERPINA3) were the top five in the highinfiltration group. Next, through unsupervised clustering of differentially expressed mRNAs, we classified patients into two gene subgroups: G1 and G2 (Figure 3(b)). e survival probability of the G1 subgroup was remarkably higher than that of the G2 subgroup ( Figure 3). After comparing the relative quantity of the 28 immune cells between the two subgroups, we found that the infiltration level of most immune cells was apparently higher in the G2 than the G1 subgroup ( Figure 3).

Construction of an Immune Infiltration-Related Gene
Signature. We conducted univariate Cox regression analysis of the 74 putative immune-related genes. A total of 70 genes significantly associated with survival were identified; they are listed in Table 3 (P < 0.01). We randomly divided glioma patients into a training set and a testing set at a ratio of 7 : 3. Next, we integrated the 70 genes into the LASSO Cox regression algorithm. A well balanced prognostic model was obtained after 1000 iterations, and we selected 15 immunerelated genes to construct the signature (Figures 4(a)-4(d)). e full names of the genes and their LASSO coefficients are listed in Table 4, and the survival curves of the genes are displayed in Figure S1. Next, we calculated risk score for each glioma patient and categorized all patients into low-risk and high-risk groups based on median risk score. KM curve analysis demonstrated that patients in the high-risk group had significantly worse OS than those in the low-risk group (Figure 4(e)). After application of the ROC curve, the AUC for 4-year prediction was 0.763 in the training set, indicating excellent prediction efficiency (Figure 4(h)). en, to validate this signature, we also calculated the risk scores of patients in the testing and total sets. Survival time was remarkably different between the low-risk and high-risk groups, both in the testing set and the total set (Figures 4(f)-4(g)). e AUC for 4-year prediction was 0.743 in the testing set and 0.754 in the total set (Figures 4(i)-4(j)).

Application of the Immune Signature in Glioma Subgroup
Analysis. We further explored the prognostic value of the signature in glioma subgroups with various pathological and molecular features. Stratification analysis was carried out according to WHO grade, IDH1 status, 1p/19q codeletion status, and O6-methylguanine-deoxyribonucleic acid methyltransferase (MGMT) promoter methylation status, which were significantly related to OS. In all cohorts, high-risk patients had shorter OS than low-risk ones (Figures 5(a)-5(i)). ese results indicated that classification based on the immune signature could accurately identify patients with discouraging prognoses, irrespective of pathological and molecular characteristics. In addition, we analyzed differences in risk scores in glioma subgroups.

Relationship between Risk Score and Infiltrating Immune
Cells. We further analyzed correlations between the risk score of the 15-gene signature and abundances of immune cells.
e risk score was positively correlated with abundances of most immune cells, such as central-memory CD4 T cells and cδ T cells, and negatively correlated with those of monocytes, effector memory CD4 T cells, activated B cells, and CD56 bright NK cells. is further proved that the signature had the potential to reflect the TME characteristics of glioma ( Figure 7).
For MF, processes such as ECM structural constituent conferring tensile strength, platelet-derived growth factor (PDGF) binding, and protease binding were significantly enriched by these genes. In KEGG analysis, the targeted genes were highly enriched in essential pathways closely correlated with cancer progression, such as ECM receptor interaction, protein digestion and absorption, and phagosome. Moreover, we also used GSEA data to identify differences in signaling pathways between the low-risk and high-risk groups (Figure 9). We found that the high-risk group was enriched in interleukin-6 (IL-6)/Janus kinase (JAK)/signal transducer and activator of transcription (STAT3) signaling, interferon gamma (IFN-c) response, angiogenesis, and coagulation, which were mainly concentrated in immune-associated pathways. In contrast, the low-risk group was enriched in oxidative phosphorylation, which was correlated with aerobic metabolism.

Validation of the Signature in an External Independent
Cohort. To validate the robustness of the signature, we calculated patients' risk scores in the external-validation set (CGGA mRNAseq_325). en, we divided these patients into low-and high-risk groups according to median risk score. KM curve analysis revealed that our signature could significantly distinguish prognosis between the two groups and that the prognoses of patients with high-risk scores were worse than those of patients with low-risk scores (Figure 10(a)). Via ROC analysis, we found that the 4-year AUC was 0.825 in the external cohort, which indicated a good predictive ability (Figure 10(b)). Furthermore, the survival probability of the high-and low-risk groups was significantly different in glioma subgroups stratified by WHO grade (grade III and grade IV), IDH1 status, 1p/19q codeletion status, and MGMT promoter methylation status (Figures 10(d)-10(k)). However, there was no significant difference in WHO grade II, and the failed statistical analysis was potentially due to limited sample size (Figure 10(c)).

Discussion
Numerous studies over the past decade have shown that the TME is a critical regulator of tumor progression and therapeutic effectiveness in glioma [32]. Infiltrative inflammatory cells; tissue-resident cells such as astrocytes, microglia, and neurons; and brain vasculature constitute the glioma microenvironment [33]. We investigated the immunogenomic landscape of glioma based on data from the CGGA database and then developed an immune-relevant prognostic gene signature for glioma patients.
As we know, the infiltration of immune cells into the TME reflects immune status and can predict prognosis in cancer [34]. In this study, we evaluated the relative abundance of 28 infiltrating immune cells in the CGGA glioma cohort.
ere were higher proportions of monocytes, central-memory CD4 T cells, plasmacytoid dendritic cells, CD56 dim NK cells, and immature dendritic cells and lower proportions of eosinophils, neutrophils, and activated B cells.
en, we split patients into two subtypes (high_infiltration and low_infiltration) with obviously different infiltration levels of immune cells. It is reported that patients with high immune infiltration tend to have better outcomes than patients with low immune infiltration in bladder, breast, and gastric cancers [35][36][37]. However, glioma patients in our high-infiltration group showed worse OS than patients in the low-infiltration group. is might have been because immune cells in the glioma microenvironment mainly suppressed immune-mediated destruction of tumor cells [38]. We can also conclude that    the grouping method based on the immune landscape was effective in reflecting the degree of immune infiltration and in predicting glioma prognosis. Our prognostic signature was positively correlated with quantity of immunosuppressive cells (such as myeloid-derived suppressor cells, Tregs, and NK T cells) as well as that of immune-stimulatory cells (such as central-memory CD4 T cells, cδ T cells, and NK cells) [39,40]. is result indicated that the regulatory system of the glioma microenvironment was special and complicated, which could also explain the failure of immunotherapy for glioma [41].
To further demonstrate the relationship between the immune landscape of the TME and the prognoses of glioma patients, we constructed an immune-relevant signature by analyzing the CGGA glioma database in detail. A significant difference between the low-and high-risk groups was observed when we applied this 15-gene signature in the training set using ROC and KM curve analyses. e predictive ability of the signature was also validated in the testing set and the external cohort, showing its effectiveness and breadth in predicting prognosis in glioma.
Our results revealed that adenosine triphosphate (ATP)-binding cassette subfamily C member 3 (ABCC3), collagen type IV alpha 1 chain (COL4A1), podoplanin (PDPN), annexin A1 (ANXA1), COL4A2, insulin-like growth factor binding protein 2 (IGFBP2), and SERPINA3 were positively associated with risk score, while CXXCtype zinc finger protein 11 (CXXC11), junctophilin 3 (JPH3), secretogranin III (SCG3), secreted protein acidic and rich in cysteine (SPARC)-related modular calciumbinding protein 1 (SMOC1), Cluster of Differentiation 14 (CD14), COL1A1, S100 calcium-binding protein A4 (S100A4), and transforming growth factor beta 1 (TGF-β1) were negatively associated with risk score. Some of these genes have been reported to play essential roles in regulating tumor immune response. For instance, high expression of PDPN not only predicts poor survival outcomes but also shows correlations with immune markers of TAMs and T cell exhaustion in gastric cancer [42]. IGFBP2 induced immunosuppression by decreasing CD8 + T and CD19 + B cells and increasing CD163 + M2 TAMs; blocking this gene suppressed tumor growth and improved survival in a glioblastoma mouse model [43]. TGF-β includes three subtypes; Christian et al. found that TGF-β1 and TGF-β2 mRNAs were strongly expressed in glioblastoma, while TGF-β3 mRNA was increased particularly in astrocytomas and anaplastic astrocytomas [44]. ese subtypes of TGF-β play important roles in cancer development processes, including cell invasion, immune suppression, and microenvironment modification [45]. Targeted inhibition of ANXA1 can reduce the function of Tregs and shrink breast tumors [46]. COL4A2 expression is positively correlated with the presence of macrophage and dendritic cell infiltration in cervical-cell cancer, while COL1A1 is positively correlated with tumor infiltration levels of macrophages and CD4 + T cells in bladder cancer [47,48].
LncRNAs have also been reported to play critical roles in the immune response to cancer [49,50]. We identified 62 differentially expressed lncRNAs correlated with immune infiltration by comparing lncRNA expression levels in the two different immune infiltration groups. Some of these lncRNAs are reported to take part in regulating tumor immune response. RP5-1119A7.17 is negatively associated with histone 1 (HIST1)/histone H2B type 1-K (H2BK), which are involved in immune response and cell growth in low-grade glioma [51]. e immunosuppression of CD39 + Tregs relies on a distinct transcriptional program with a low level of neurofilament light (NEFL) [52]. NEFL is also considered an immune gene associated with prognosis in triple-negative breast cancer [53]. Our GSEA results demonstrated that patients in the highrisk group were significantly enriched in immune-associated pathways such as IL6/JAK/STAT3 signaling, IFN-c response, angiogenesis, and coagulation. In contrast, patients of the low-risk group were significantly enriched in oxidative phosphorylation. IL6/JAK/STAT3 signaling promotes tumor proliferation, invasiveness, and metastasis and suppresses antitumor immune response in the TME [54]. IFN-c response has a controversial role in antitumor immunity. On one hand, IFN-c might suppress the immune response and further contribute to tumor growth and metastasis; on the other hand, it might stimulate host immune response and increase the efficacy of immunotherapy by activating macrophages, dendritic cells, and T lymphocytes in the TME [55]. Previous studies have shown that extravascular coagulation and angiogenesis promote tumor immune evasion and that pharmacological targeting of the coagulation signaling pathway with factor Xa (FXa) could enhance tumor antigen presentation and cytotoxic tumor killing [56][57][58]. Oxidative phosphorylation is associated with the energy supply of tumor cells. When it is activated, the production of lactic acid is inhibited, which further alkalifies the outside environment and hinders the spread of tumors [59].
Although the immune-related signature was stable in our study, our research had its limitations. First, we excluded patients whose OS was not available, meaning that lost-to-follow-up bias cannot be eliminated. Second, some potential prognostic factors were missing in our signature, such as age, genetic alterations, WHO grade, and treatments.
ird, further basic experiments are required to validate the functions of the genes in the signature.

Conclusion
is investigation comprehensively analyzed RNAseq data from the CGGA database and established an immune-related gene signature with the ability to predict survival outcomes in glioma. We hope that the results of our study can help identify critical genes and pathways associated with glioma and provide potential immune targets for treatment in the future.
Data Availability e transcriptomic and clinical data used to support this study are deposited in CGGA (http://www.cgga.org.cn/).

Conflicts of Interest
e authors declare that they have no conflicts of interest.