Weighted gene correlation network analysis identifies microenvironment-related genes signature as prognostic candidate for Grade II/III glioma

Glioma is the most common malignant tumor in the central nervous system. Evidence shows that clinical efficacy of immunotherapy is closely related to the tumor microenvironment. This study aims to establish a microenvironment-related genes (MRGs) model to predict the prognosis of patients with Grade II/III gliomas. Gene expression profile and clinical data of 459 patients with Grade II/III gliomas were extracted from The Cancer Genome Atlas. Then according to the immune/stromal scores generated by the ESTIMATE algorithm, the patients were scored one by one. Weighted gene co-expression network analysis (WGCNA) was used to construct a gene co-expression network to identify potential biomarkers for predicting the prognosis of patients. When adjusting clinical features including age, histology, grading, IDH status, we found that these features were independently associated with survival. The predicted value of the prognostic model was then verified in 440 samples in CGGA part B dataset and 182 samples in CGGA part C dataset by univariate and multivariate cox analysis. The clinical samples of 10 patients further confirmed our signature. Our findings suggested the eight-MRGs signature identified in this study are valuable prognostic predictors for patients with Grade II/III glioma.


INTRODUCTION
Gliomas are the most common malignant tumor type of the central nervous system which account for about 80% of primary malignant brain tumors [1]. According to the criteria of the World Health Organization (WHO), gliomas are divided into four grades (WHO Grade I, II, III and IV) [2]. WHO Grade I gliomas, mainly including pilocytic astrocytomas, are considered to be benign tumors [3]. WHO grade II/III gliomas tumors that include infiltrative astrocytomas and oligodendrogliomas mostly occur in the adult cerebral hemisphere [4]. The median overall survival (OS) of patients with Grade II glioma patients is about 6-12 years, while the patients with Grade III glioma is reduced to 3 years (30-40 months) [5,6]. WHO grade IV glioblastoma multiforme (GBM), the most malignant glioma type with, accounts for approximately 50% of glioma cases, and has a median survival time of 14.2 months [7]. Although surgical treatment combined with radiotherapy and chemotherapy methods has made great strides, the overall prognosis of patients with glioma has not significantly been improved [8]. The social burden of Grade II/III gliomas is more serious due to the poor quality of life caused by surgery and the high recurrence rate. However, the current research is mostly focused on the GBM, but research on the Grade gliomas is very limited.
In recent years, researchers have made remarkable progress in the field of immunotherapy in blood and many other solid tumors, which provide a new choice AGING for the glioma therapy [9][10][11]. Through immunotherapy, especially immune checkpoint inhibitors, significant responses were observed in different types of tumors [12,13]. More and more evidence show that the effect of immunotherapy is not only associated with tumor cells, but also to the tumor microenvironment (TME). The tumor microenvironment refers to the surrounding tissue around tumor cells, including cytokines hormones, extracellular matrix and a series of immune and stromal cells, which have important influence on gene expression and clinical prognosis [14]. In view of these latest findings, novel therapies for immune responses are at present being developed and glioma microenvironment as a potential therapeutic target has been paid more and more attention [15][16][17].
With the development of sequencing technology, highthroughput genomic analysis platforms provide promising tools for clinical oncology research. Compared with traditional clinical characteristics, genomics indicators are uniform and objective. Multiple genes signature can be used in combination to provide additional prognostic information [18][19][20][21]. Yoshihara et al. calculate the expression of specific molecular biomarkers in immune cells and stromal cells to generate an estimation algorithm with immune/stroma/ESTIMATE score to predict the matrix components and immune cell infiltrating cells in TME [22]. The immune score is closely related to the degree of immune cell infiltration. The higher the scores, the more immune cells infiltration. The stromal score stands for tumor matrix components. The higher the score, the more the matrix around the tumor. The ESTIMATE score is the sum of the immune score and the stromal score. The higher the score, the lower the purity of the tumor. This method is the most widely used and most convincing method currently used, and it has been experimentally confirmed that it is consistent with the actual situation of the tissue surrounding the tumor [23,24].
The weighted gene co-expression network analysis (WGCNA) is a powerful tool to construct free-scale gene co-expression networks which is widely used to analyze large-scale datasets [25]. Thus, we establish Grade II/III glioma prognosis signature, and investigated the clinical application of the model using the microenvironment-related genes (MRGs).

RESULTS
Immune scores and stromal scores are significantly associated with Grade II/III glioma patients' overall survival 459 patients with Grade II/III glioma in the TCGA database with a follow-up time greater than 3 months were included in the present research. The stromal scores ranged from -1816.74 to 1716.82, immune scores were -1722.93 ~ 2189.02, and ESTIMATE scores were ~3512.4 to 3775.10 according to the ESTIMATE algorithm. K-M analysis was done to reveal the correlation between the scores and survival by dividing 459 patients into high and low score groups with the median score (low score group:230; high score group:229). The overall survival of Grade II/III glioma patients with low immune scores was significantly higher than that of patients with high immune scores (P=0.002) ( Figure 1A). The same trend occurs in the high stromal (P < 0.001) ( Figure 1B) and ESTIMATE (P=0.0012) ( Figure 1C) score group.

Co-expression module construction
459 samples from TCGA were clustered using the average linkage method to assess microarray quality and filter outliers. The power of β = 6 was chose as the softthreshold (scale-free R2 = 0.97; Figure 2A-2C). 7 modules were clustered by average linkage hierarchical clustering. Yellow module was selected for further analysis, which was highly correlated with ESTIMATE AGING score ( Figure 2D). The enrichment analysis conducted in this study indicated that the genes in the yellow module were enriched in the positive regulation of leukocyte chemotaxis, myeloid leukocyte differentiation, and T cell proliferation ( Figure 3). We did not select this grey module for further analysis for it contained too many genes to be processed in the next step, although the module was also significantly related to the ESTIMATE scores.

Identification of a MRGs signature
The 46 genes in the yellow module were considered to be related to the ESTIMATE scores and were conduct to the further research. Then Lasso-penalized Cox analysis with 10-fold cross-validation was performed to narrow the genes for prediction of the OS ( Figure  4A). Eight microenvironment-related genes were identified ( Figure 4B). Kaplan-Meier analysis in the TCGA and CGGA part C dataset indicated that all the 8 genes significant correlation with survival time of the patients (Figures 5, 6). The predictive model was defined as the linear combination of the expression levels of the 8 MRGs signature weighted by their relative coefficient in the multivariate Cox regression model, as risk score = (-0.092 * expression level of HCK) + (0.101 * expression level of HAVCR2) + (0.063 * expression level of CD37) + (−0.128 *   AGING expression level of LPAR5) + (0.290 * expression level of NAGA) + (−0.005 * expression level of C1QC) + (0.023 * expression level of FCER1G) + (−0.040 * expression level of AIF1). Risk score is determined by MRGs combined with survival time in TCGA dataset. The higher the risk score, the greater the ESTIMATE score, which means more immune cell infiltration and more stromal composition, and the shorter the survival time.

Analysis of the MRGs signature in the TCGA cohort and CGGA cohort
The patients in the TCGA cohort were divided into high-risk group (n=229) and low-risk group (n=230) according to the risk score. We then evaluated the prognostic difference between the two groups by Kaplan-Meier curve based on the log-rank test. The results showed that the prognosis of the high-risk group was far worse than that of the low-risk group (P<0.001) ( Figure 7A). We also made a heat map to describe the expression modules of MRGs in the two groups ( Figure  7B). Then we sorted the risk scores of patients in the TCGA cohort, and marked the survival time and survival status of the samples in the dot plot. We found that the number of deaths in the high-risk group was much greater than that in the low-risk group, and their survival time was generally lower than that of the lowrisk group. Then we performed ROC curve analysis to evaluate the ability of the MRGs signature as a detection method, and its area under the curve (AUC) reached 0.8 ( Figure 7D), which illustrated the reliability of our MRGs signature.
After evaluating the expression characteristics of MRGs, we further performed univariate and multivariate Cox www.aging-us.com 22127 AGING regression analysis to evaluate the role of MRGs signature in the prognosis of glioma patients. Univariate Cox analysis showed that MRGs risk score, age, the histology, the glioma grade, IDH gene mutant status and Karnofsky_score were closely associated to the prognosis of patients with Grade II/III glioma. The variables with P<0.05 in the univariate analysis were used for multivariate Cox analysis, and the results showed that MRGs signature were an independent risk factor when the clinical characteristics above were taken into consideration. (Table 1). In the CGGA part B and C cohorts, the MRGs risk score was still an independent prognostic factor (Tables 2, 3). These results indicated that the MRGs signature could be an independent prognosis characteristic of the patients with Grade II/III glioma.
In experimental verification, we randomly selected 10 cases with a follow-up time of more than 3 months, and divided them into two groups according to their survival time (5 patients per group). All sample information is shown in the Supplementary Table 1. QPCR experiment was carried out to measure the mRNA expression of 8 MRGs in the ten patients. The results showed that the high survival group had lower risk score than that of the low survival group (Figure 8; P<0.001). This experiment verified the good clinical application value from another aspect of our MRGs signature. After identifying prognostic value of the risk score, we performed correlation analysis between the MRGs signature and immune cell infiltration for Grade II/III glioma in Figure 9. The risk score was significantly correlated with the infiltration of CD4 + T cells, CD8 + T cells, dendritic cells, macrophages and neutrophil cells (all P < 0.05), suggesting that the level of immune infiltration was generally increased.

DISCUSSION
Solid tumors consist of tumor cells and non-tumor tissue. Non-tumor tissue outside of tumor cells is often referred to as the tumor microenvironment. The composition of the solid tumor is usually determined by the pathologist through visual assessment, which may be affected by histopathological sensitivity, interobserver variability, and accuracy. With the development of high-throughput sequencing platform, the research on molecular mechanism of tumor is changing with each passing day. In addition to routine pathological assessments, advances in genomics have introduced many computational methods based on polygenic data to determine the tumor microenvironment [26]. These methods are highly consistent with pathologists' observations and are widely used in the study of various tumors.
Previous researches have studied the value of microenvironment-related gene signal models in predicting the prognosis of patients with different types    microenvironment-related genes to construct a breast cancer prognostic risk model and found that the model can accurately stratify patients with different survival outcomes [27]. All of these indicate that tumor microenvironment-related genes play key roles in the tumor prognosis, and are prognostic indicators potentially. However, the value of microenvironmentrelated genes for Grade II/III gliomas remains to be elucidated.  AGING Through high-throughput expression profiling, WGCNA, lasso, and multivariate Cox analysis, we acquired eight genes (HCK, HAVCR2, CD37, LPAR5, NAGA, C1QC, FCER1G and AIF1) to construct the MRGs signature in Grade II/III glioma patients. We first evaluated characteristics of the microenvironmentrelated genes in TCGA and found that the survival rate in the high-risk group was significantly lower than that in the low-risk group which indicated that the MRGs signature could accurately stratify patients with different survival outcomes. The ROC analysis results showed the signature had a good predictive effect on glioma prognosis. In addition, the multivariate Cox regression model showed that the MRGs signature was an independent risk factor when the clinical characteristics such as age, pathological type, grade, IDH mutation, and K-score were taken into consideration. Similar results were observed when verifying in the CGGA Part B and Part C cohort. These results indicated that the MRGs signature was a novel and important pathway in predicting tumor immune infiltration and patient survival outcome in Grade II/III glioma. predict survival and. It may be utilized as a prognosis stratification tool added to the current system due to the good prospect of clinical application.
Previous studies have shown that immune infiltration is an important factor in determining glioma response and prognosis. We analyzed the correlation between the MRGs signature and infiltration of immune cells. Convincingly, we found that the MRGs risk score was positively correlated with several common types of infiltration immune cells (B cells, CD4 + T cells, CD8 + T cells, neutrophils, macrophages and dendritic cells). Further research needs to be done to explore the relationship among MRGs and immune infiltration and the impact of glioma prognosis.
The biological functions of some MRGs have been elucidated in glioma and other tumors. The protein expressed by HCK is a tyrosine kinase in Src family. This protein is mainly present in the hematopoietic system, especially in B-lymphoid and myeloid lineages cells. It may play a critical role in helping couple Fc receptors in immune cells [28]. HCK protein regulates the phagocytosis of neutrophils through the G proteincoupled IL8 receptor. In macrophages, HCK protein can regulate the immune response by enhancing their proliferation, migration and secretion of IL6 and the macrophages of the HCK knockout mice showed weak phagocytosis [29,30]. Previous studies have shown that in leukemia breast and colon cancer the up-regulation of HCK protein is often accompanied by higher tumor grades and the prognosis of patients is mostly poor [31][32][33]. But the role of HCK in glioma is still unknown. The protein encoded by HAVCR2 (also named Tim-3) belongs to the immunoglobulin superfamily, and TIM family. It is a Th1-specific cell surface protein that regulates macrophage activation, and inhibits Th1mediated auto-and alloimmune responses, and promotes immunological tolerance [34]. TTim-3 along with PD-1 marks exhausted CD4 and CD8 T cells in experimental models and in humans [35][36][37]. Studies indicated that the percentage of Tim-3 and PD-1 co-expressing T cells in tumors and blood of GBM patients increased compared with healthy group [38]. Although anti-Tim-3 alone is not enough to produce a therapeutic response, the combined application of anti-Tim-3 and anti-PD-1 antibodies can significantly improve the survival rate of a syngeneic mouse orthotopic glioma model [39]. CD37 is a member of the tetra-spanning superfamily and is involved in the cell adhesion, movement, differentiation, intercellular communication and immune response. In addition, it also participates the interaction between B and T cells and the production of immunoglobulin G [40]. CD37 is highly expressed on the surface of most CLL and NHL cases, which is an attractive target for immunotherapy [41]. However, its role in glioma is still unclear. LPAR5 acts as a rhodopsin class of G proteincoupled transmembrane receptors [42]. ATX-LPAR signaling axis is reported to induce MMP-9 expression in hepatocellular carcinoma (HCC) subsequently enhancing the invasive capacity of these cells [43]. NAGA encodes the lysosomal enzyme alpha-Nacetylgalactosaminidase, which cleaves alpha-Nacetylgalactosaminyl moieties from glycoconjugates [44]. At present, there are very few researches on NAGA. C1QC protein is the C-chain polypeptide of C1q. C1QC is mainly involved in complement-mediated innate and adaptive immune response. Study showed that C1q promotes the maturation of dendritic cells (DC) through the combination of pathogen-associated molecular patterns and danger-related molecular patterns, and then interacts with different molecules on the surface of dendritic cells in the early stages of immunity [45]. FCER1G, also known as FcRγ, is a component of the immunoglobulin E receptor and IL-3 receptor complex. FCER1G mainly participates in biological processes of allergic inflammation signals and can regulate the activation of neutrophils and platelets [46]. FCER1G plays a tumor-promoting role in various tumors such as leukemia and solid tumors of the renal and Meninges [47][48][49][50]. AIF1, also called IBA1, encodes a protein that binds actin and calcium. The function of AIF1 protein is closely related to the activation of macrophages and is a specific marker for the activation of macrophages and microglia in the central nervous system [51]. It is reported that in hepatocellular carcinoma macrophages overexpressing AIF1 promoted tumor migration and proliferation of tumor cells [52]. In vitro study showed that AIF1 promoted tumor growth via the NF-κB/cyclin D1 AGING pathway [53]. The biological roles of most MRGs remain largely unclear in Grade II/III glioma. Further studies are required to explore the underlying mechanisms of the MRGs on glioma.
There are several differences between our work and previous research. First, we focused on the microenvironment-related genes expression pattern in Grade II/III gliomas using ESTIMATE algorithm. This is the first application of the algorithm to Grade II/III gliomas. Secondly, unlike articles with other single analysis methods, we use multiple algorithms (including estimation algorithms, WGCNA, and lasso regression) to identify microenvironment-related genes. Finally, the results were verified using two independent databases and a small sample experiment, making our conclusion more reliable.
Inevitably, our research has some shortcomings. Firstly, we use the retrospective data from public databases that have not been validated in prospective study. Secondly, the mechanisms of the MRGs on Grade II/III glioma needs further experimental research in vitro and in vivo. Additionally, due to the small number of patients in experimental verification, the signature still needs to be verified by larger samples.

CONCLUSIONS
To sum up, we discovered 8 microenvironment related genes, and constructed a signature to group patients at low and high risk of low survival time. This feature may have potential significance in evaluating the prognosis of Grade II/III glioma patients.

Data processing
The TCGA -LGG level 3 gene expression profiles and the clinical data of 529 samples of Grade II/III glioma patients were acquired from the TCGA data coordination center (https://portal.gdc.cancer.gov/) [54]. A total of 19767 protein coding genes were quantified by fragments per kilobase of exon per million reads mapped (FPKM). The ESTIMATE algorithm in the R project was applied to calculate the stromal and immune scores of each glioma sample in the TCGA dataset. 440 Grade II/III glioma samples including RNA sequencing in the CGGA part B dataset and 182 Grade II/III glioma samples in the CGGA part C dataset was acquired to verify the signature (http://www.cgga.org.cn/) [55,56]. Cases with a follow-up time greater than 3 months were used for later analysis. 10 simples from Renmin Hospital of Wuhan University with clinical data was used for experimental validation.

Co-expression network construction and identification of clinical significant modules
The R package 'limma' was applied to preprocess the downloaded raw data including background adjustment and normalization. Using the R package 'WGCNA', we further processed the dataset by variance analysis. 14828 genes by P-value were screened out due to the low sensitive of the WGCNA analysis in low expression changes genes amid samples. The remaining 4895 genes were chosen for further analysis. The gene co-expression network was constructed by WGCNA package in R using the expression data profile of these 4 895 genes. 6 was selected as the soft threshold to convert the coexpression similarity matrix into an adjacency matrix. Then the topological matrix was developed by using the topological overlap measure (TOM) in the R project. The minimum size of genes for each module was set as 30. The feature genes of modules are calculated, and the similar modules are clustered and merged according to the module dissection threshold. Finally, the correlations between gene modules and clinical traits were calculated and visualized through a heatmap. The modules which is positively related to ESTIMATE scores was selected for further analysis.

Gene ontology and pathway enrichment analysis
ClueGO (v2.5.5) is a Cytoscape (v3.7.1) plugin that visualizes the non-redundant biological terms for large clusters of genes in a functionally grouped network. CluePedia (v1.5.3) is a functional module of ClueGO [57,58]. Biological process (BP), cellular component (CC) and molecular function (MF) functional groups in GO terms and KEGG pathways analysis of selected genes were enrolled and visualized using ClueGO and CluePedia. The pathways with adj-P < 0.05 were visualized in Cytoscape.

Construction of the MRGs prognostic signature
The Least Absolute Shrinkage Sum Selection Operator (LASSO) is an analysis method that accurately sets the regression coefficients of many irrelevant features to zero, thereby reducing interference variables. It is an important method and has many applications in regression analysis [59,60]. The LASSO regression analysis was performed by R package 'glmnet'. In the analysis, genes in the module which is positively related to ESTIMATE scores were used as the input. Eight genes were got from the LASSO regression. Then, multivariate Cox regression analysis was performed to evaluate the weight coefficient of the genes and constructed a prognostic risk model. Overall survival was analyzed by R package 'survival' using Kaplan-Meier method, and the log-rank test was performed to explore qualitative variables as appropriate of the differences of survival between groups. Timedependent receiver operating characteristics (ROC) analysis using the R package 'survivalROC' was carried out to investigate the prognostic accuracy of MRGs signature. Immune infiltrate data of Grade II/III glioma cases containing the level of 6 types of tumor-infiltrating immune cells (B cells, CD4+ T cells, CD8+ T cells, neutrophils, macrophages and dendritic cells) was obtained from the Tumor Immune Estimation Resource (https://cistrome.shinyapps.io/timer/) [61].

Quantitative polymerase chain reaction
10 human glioma simples were obtained from the Department of Neurosurgery, Renmin Hospital of Wuhan University. Quantitative PCR of the 8 genes mRNA level was perform using Bio Rad CFX Connect Real-Time System. GAPDH was used for normalization.

Statistical analyses
Statistical analyses were performed using SPSS v25.0 (SPSS Inc. Chicago, IL, U.S.A.). The univariate and multivariate analysis was performed by cox regression model. Hazard ratios (HRs) with their respective 95% confidence intervals were calculated by Wald test. All tests were two-sided and P <0.05 was considered statistically significant.

Ethical approval
This study was approved by the Institutional Ethics

AUTHOR CONTRIBUTIONS
QC and BL designed this study. YL, GD, YQ, HZ, RG, HJ, ZY and BL performed the data collection and collation. YL wrote the paper. GD revised the manuscript.

ACKNOWLEDGMENTS
We gratefully acknowledge the TCGA and CGGA project organizers as well as all study participants for making data and results available.

CONFLICTS OF INTEREST
All authors declare that they have no conflicts of interest.

FUNDING
The present study was supported by grants from the National Science Foundation of China (nos. 81502175 and 81572489).