Integrating machine learning and bioinformatics analysis to m6A regulator-mediated methylation modification models for predicting glioblastoma patients’ prognosis and immunotherapy response

Background: Epigenetic regulations of immune responses are essential for cancer development and growth. As a critical step, comprehensive and rigorous explorations of m6A methylation are important to determine its prognostic significance, tumor microenvironment (TME) infiltration characteristics and underlying relationship with glioblastoma (GBM). Methods: To evaluate m6A modification patterns in GBM, we conducted unsupervised clustering to determine the expression levels of GBM-related m6A regulatory factors and performed differential analysis to obtain m6A-related genes. Consistent clustering was used to generate m6A regulators cluster A and B. Machine learning algorithms were implemented for identifying TME features and predicting the response of GBM patients receiving immunotherapy. Results: It is found that the m6A regulatory factor significantly regulates the mutation of GBM and TME. Based on Europe, America, and China data, we established m6Ascore through the m6A model. The model accurately predicted the results of 1206 GBM patients from the discovery cohort. Additionally, a high m6A score was associated with poor prognoses. Significant TME features were found among the different m6A score groups, which demonstrated positive correlations with biological functions (i.e., EMT2) and immune checkpoints. Conclusions: m6A modification was important to characterize the tumorigenesis and TME infiltration in GBM. The m6Ascore provided GBM patients with valuable and accurate prognosis and prediction of clinical response to various treatment modalities, which could be useful to guide patient treatments.


INTRODUCTION
Glioma is the highest incident primary brain tumor, accounting for ~80% of all adult brain cancer cases [1]. Glioblastoma multiforme (GBM) is classified as a grade IV cancer by the World Health Organization (WHO) and comprises 57.3% of all gliomas [2]. Its clinical presentation varies depending on its location and size and can be characterized by headaches, seizures, neurological dysfunctions, etc. [3]. Currently, little is known about its etiology [4]. Although multiple treatments have been developed for treating the patients, including surgery, chemotherapy, radiation and others, their survival rate remains far from satisfactory [5]. In addition, despite the promising efficacies of immunotherapy, most patients do not respond well [6,7], urging the need for better treatments [8].
GBM develops in a highly complex tissue environment and depends on the continuous growth, invasion and metastasis of these environments, which is why it is constantly infiltrating and challenging to cure. Targeting the tumor microenvironment (TME) was shown to potentially decrease the risk of drug resistance and disease recurrence. TME has a variety of capabilities to induce the beneficial and unfavorable consequences of tumorigenesis, so how to destroy and promote the occurrence of GBM The TME is a challenging job [9][10][11].
Recent literature reported that m6A was related to immunotherapy, which affects TME and its related immune cells by regulating targeted RNA. Therefore, m6A has become a potential target for immunotherapy. m6A may have different roles in different tumors, but the m6A detector has been shown to have broad significance in specific cancers [22][23][24]. In several cancers, including kidney, lung, gastric and other cancers [25][26][27][28][29][30][31], m6A-related signals have been identified as tumor immunity Phenotype and biomarker of anti-PD-1 immunotherapy response. These findings indicated that M6A modifications could mirror the TME status and anticipate immunotherapeutic effects in numerous cancers, not limited to specific cancer types. Another study found that m6A is inseparable from low-grade gliomas [32]. However, due to technical limitations, most studies focused on 1 or 2 m6A-related genes. TME is characterized by multiple highly coordinated components in a network, so there is no research on combining m6A and TME. In addition, deeper investigations on m6A methylation modification in GBM are needed to improve treatment outcomes.
In this study, we used 19 GBM-related m6A regulatory factors to analyze m6A modification patterns in GBM samples from the Chinese Glioma Genome Atlas (CGGA; http://www.cgga.org.cn/) and Cancer Genome Atlas (TCGA) databases, which were then validated in our own cohort. Next, we proposed a nomogram for quantifying the m6A modification patterns of GBM patients and predicting their potential immunotherapy response and prognosis. Altogether, our results showed that m6A modification promoted GBM progression and the potential clinical significance of our scoring system to guide the treatment and estimate the survival of GBM patients.

Data
The mRNA expression profile data and sample copy number variation (CNV) of Caucasian GBM samples are downloaded from the University of California, Santa Cruz (https://xenabrowser.net/datapages/). The expression profiles of Chinese GBM samples were obtained at the CGGA. In addition, were downloaded from The GEO database (https://www.ncbi.nlm.nih.gov/geo/) were queried to obtain the expression profiles of three groups of glioblastoma samples, namely GSE83300 [33], GSE74187 [34,35]. For clinical information and data, the R package cgdsr and [36] are used for processing. Table 1 shows the retrieved information from the datasets. Perform consistency processing on the above data, including z-score [37] and batch correction [38,39].

Unsupervised clustering
Expression data on 21 m6A regulators was extracted from 1,206 data in all data sets to determine m6A modification patterns. Some data set did not detect the expression of IGF2BP1 and METTL14, so the final expression extracted was 19 regulatory factors. These Conclusions: m6A modification was important to characterize the tumorigenesis and TME infiltration in GBM. The m6Ascore provided GBM patients with valuable and accurate prognosis and prediction of clinical response to various treatment modalities, which could be useful to guide patient treatments. , and based on their expression levels, various m6A modification patterns were determined and patients' data were also analyzed [40].

Gene set variation analysis (GSVA) and single sample gene set enrichment analysis (ssGSEA)
GSVA enrichment analyses were conducted for determining the difference of m6A modification mode using c2.cp.kegg.v6.2 from the MSigDB database (https://www.gsea-msigdb.org/gsea/index.jsp) [41]. ssGSEA analyses were conducted to determine the ratio of 24 immune cells in GBM [42], the Wilcox test for comparing different samples, and Cox regression analysis for comparing their survival.

Differentially expressed genes (DEGs) between m6A clusters
The TCGA, CGGA and GEO datasets were grouped into two categories according to the 19 m6A gene expressions (Supplementary Tables 1, 2). Using the R "limma" package, we identified DEGs among the different m6A clusters [43] based on P<0.05 and difference multiple >1.5 times or <0.67 times.

m6A score calculation
Using the DEGs identified above, we implemented the random forest method to eliminate redundancy and perform survival analysis (P value <0.05). The DEGs were classified into two groups (positive or negative coefficients) through cox regressions. Refer to GGI scoring based on the equation to calculate m6Ascore [44].
Scale represents the standardization process, while X and Y represent the gene set expressions using a positive and negative Cox coefficient, respectively.
surv_cutpoint function was used for identifying the best threshold point (cutoff=-0.9884624) [45] to divide the samples into a high and low m6A score, performing correlation analysis and survival estimation.

Correlation between m6A score and other biological processes
GSVA analyses were conducted for quantifying the biological functions of the samples, and Pearson correlation analysis on the m6Ascore and ES scores of these biological functions to reveal underlying biological pathways involved in m6Ascore.
GISTIC (Q≤0.05, 95%CI) was computed to determine the common copy number change area in the samples [46].
The R "pRRophetic" package was used to obtain the IC50 of Cisplatin and Gemcitabine based on the expression profiles and to compare the difference in IC50 between the high and low m6A scores.
The TIDE tool (http://tide.dfci.harvard.edu/) was used to assess the clinical effect of immunotherapy. A higher TIDE score indicated poor immunotherapy efficacy and prognosis. Among five cancers with immune dysfunctions and rejection characteristics, only melanoma cases treated with anti-PD1 or anti-CTLA4 were available in public databases. Immune checkpoint treatment prognosis prediction in this analysis is completed by TIDE score.

Genetic variation of m6A regulatory factor of GBM in TCGA database
Here, we aimed to investigate the m6A regulators' genetic backgrounds and variations in GBM. First, we selected all tumor samples in the TCGA database and analyzed the 21 m6A regulators, the eight writers, two erasers and 11 readers. First, we studied the mRNA levels of the m6A regulatory factors between TCGA-derived tumors and normal samples, which showed greater expressions in the tumor samples, including METL3 and WTAP (p<0.01) ( Figure 1A). This shows that the role of m6A, in general, is to promote tumor growth.
Then, the CNV and somatic mutation rate of 20 m6A regulatory factors in GBM samples were investigated. HNRNPC has the highest mutation frequency, reaching 5% ( Figure 1B). CNV mutations are generally changed in regulatory factors. The copy number of some genes is amplified. The frequency of CNV deletion is noticeable in genes such as HNRNPC, METL3 and ZC3H13 ( Figure  1C and Supplementary Tables 3, 4). In addition, m6A regulated The position of the organ on chromosomes ( Figure 1D). PCA analysis showed that tumors and normal samples could be distinguished ( Figure 1E).

Unsupervised clustering of m6A genes in GBM samples
Due to the lack of IGF2BP1 and METTL14 expression levels in some data sets, we determined the consistency clustering of the m6A genes and m6A gene single-factor Cox regression using gene expression profile data of 19 m6A regulators and the survival data from the TCGA, CGGA and GEO data sets. Findings from the m6A regulatory networks indicated the associations between the m6A regulatory factors (Figure 2A) and that between regulatory factors and prognosis. Our findings indicated that interactions between m6A regulators of different functional categories played essential roles for forming different m6A modifications in GBM. Then, the expressions of the regulators from TCGA, CGGA and GEO were obtained, and we utilized the R "ConsensusClusterPlus" package to conduct  consistency clustering, based on which m6A clusters A and B were identified ( Figure 2B) and further analyses showed that the prognosis between them was similar (P=0.23, Figure 2C).

Functional annotations and TME infiltration characterization of the m6A clusters
GSVA enrichment analysis was conducted to assess the differences between the biological behaviors of the regulatory factors in the two m6A modification subgroups. m6A cluster A was found to be significantly enriched in cell cycle pathways, while m6A cluster B in biosynthesis and oxidative phosphorylation of glycosphingolipids pathways ( Figure 2D, Significant analysis results are shown in Supplementary Tables 6, 7).
Furthermore, ssGSEA analysis using the RNA-seq data was conducted to determine the proportions of 28 immune cells, such as B memory cells, activated Dendritic cells and M0 macrophages, in GBM samples ( Figure 2E) and the findings showed significantly different distributions of immune cells abundance between the two subgroups. Figure 2F shows the results of single-factor Cox regression analysis of different proportions of immune cells between m6A clusters A and B (Analysis results are shown in Supplementary Table 8).
In the TCGA database, we found that the mutation difference of IDHI (chi-square test, P=0.1998181), TP53 (chi-square test, P=0.4284502) and EGFR (chisquare test, P=0.9538232) in subgroups m6AclusterA and m6AclusterB were not significant ( Figure 3A, specific information is shown in Supplementary Table  9). No obvious differences were observed between their clinical characteristics ( Figure 3B).
GSVA analysis was carried out with gene set constructed by Mariathasan; there is a significant difference of enrichment scores between the m6Acluster groups ( Figure 3C, analysis results are shown in Supplementary  Table 10). Additionally, the m6A regulatory factor distributions in m6AclusterA and m6AclusterB are shown in Figure 4 (Analysis results are shown in Supplementary Table 11).

m6A-related DEGs and constructions of the m6Agenecluster
The biological behaviors of each m6Acluster were detected in 73 phenotype-related differential genes with a limma software package (Supplementary Table 12).
Furthermore, we implemented an unsupervised cluster analysis using the phenotype-related genes of m6A to classify patients into separate genomic subgroups according to different genomic subtypes based on their m6A-modified genomic phenotypes, m6A gene cluster A and m6A gene cluster B ( Figure 5A and Supplementary Table 13). The findings showed the prognosis of m6A gene cluster A tumor was better than m6A gene cluster B, and part of the m6A regulatory factors expression levels were markedly greater compared to m6A gene cluster B ( Figure 5C).

Analysis of m6Ascore
The Random Forest method was implemented to remove the redundancy in the DEGs and identified significant genes associated with the classification (Supplementary Table 14). Cox regression was performed to establish the association between the significant genes and GBM patients' survival. The genes' coefficient values were used to separate them into two groups, and all samples were scored with the m6Ascore formula. Finally, based on the surv_cutpoint function of R package survminer, the optimal threshold of m6score was determined (cutoff=-0.9884624) as a standard to divide the samples into m6Ascore high group and m6Ascore low group ( Figure 6A and Supplementary  Table 16). Our results indicated that the m6Ascore low group had significantly better survival than the m6Ascore high group (P<0.0001), suggesting that the m6Ascore could be used to accurately characterize GBM patients' prognoses ( Figure 6B).
Correlation assessment between m6Ascores and known gene features indicated a significant positive correlation between m6Ascore and biological functions like EMT2 and immune checkpoint ( Figure 6C and Supplementary Table 15). Wilcox test showed significant differences between m6Acluster and m6Agenecluster in m6ascore ( Figure 6E, 6F). Further, the m6AclusterA and m6AgeneclusterA had significantly superior m6Ascore compared with the other groups.
Furthermore, in the TCGA cohort, a significant difference in m6A scores between the subgroups, such as IDH1 mutation status and TP53 mutations, were found.

m6Ascore predicted chemotherapy and immunotherapy responses of GBM patients
In TCGA, CGGA and GEO datasets, we further estimated and compared the IC50 values of Cisplatin and Gemcitabine according to the expression levels of the m6Ascorehigh and m6Ascorelow groups using the R "pRRophetic" package, and found that a significantly greater IC50 value in the m6Ascorelow group compared with the m6Ascorehigh group, which suggest that patients in the m6Ascore high group had poorer drug resistance (Cisplatin: P<2.2e-16, Gemcitabine: P=8.1e-13; Figure 9A, 9B).
Meanwhile, in 154 samples with sequencing data (including 139 samples with chip data) from the TCGA database, a numerically greater TIDE score was observed for the m6Ascorehigh group compared with the m6Ascorelow group ( Figure 9C), but the difference is not significant. The AUC of the ROC curve for the m6score in predicting immunotherapy response was 0.53.

DISCUSSION
Multimodal therapies are the standard treatments for GBM, which include surgery resection, radiation therapy, and systemic treatment with temozolomide [47]. However, traditional treatment methods often have serious side effects, and the treatment effect is not very good, thereby cannot significantly improve patients' survival [48]. We explored the molecular and clinical characteristics between different subgroups based on the 19 m6A genes through consistent clustering. Further, by dimensionality reduction in m6A-related DEGs between the subgroups, we determined their association with patient prognosis. Then m6Ascore of these genes were calculated, and the differences between the m6Ascore high group and m6Ascore low group were compared.m6Ascore based classification was used for prediction of immunotherapy response. There exist some advantages of this study. Significant differences are shown in prognosis, clinical features or molecular characteristics between the groups of m6Agenecluster and m6Ascore. There are also some limitations. First, in the prediction of immunotherapy response, AUC of ROC curve is comparably low, only reached 0.53. Second, the difference of m6Acluster in prognosis and clinical characteristics were not significant. The clinical features of this part of glioblastoma did not find the status information of 1p19q. By consulting the data, this feature belongs to GBM.  indicates that m6A modifies the genome table Type has an obvious relationship with overall survival rate; (C) Expression of 19 m6A genes in 2 gene clusters. The upper and lower ends of the box indicate the interquartile range of values. The line in the box indicates the median value, and the black dots indicate outliers. The T test is used to test the statistical differences between gene clusters (**P<0.05, ***P<0.01, ****P<0.001).
In the past decades, many new concepts were investigated as an attempt to improve the treatments of GBM, including genetic testing, electromagnetic field therapy, and function-guided resection [2,49]. However, the underlying causes of GBM malignant progression have not been explored, so there are still no biomarkers to accurately predict the prognosis and exacerbation of GBM patients [32,50].
The TME and purity of tumor cells have important roles in tumor invasion and progression [51]. Therefore, by comprehensively analyzing the properties of TME and the cells it recruits in GBM, the immunophenotype of GBM at various stages can be identified, so as to find accurate biomarkers and discover new and effective therapeutic targets [52,53].
Determining the roles and underlying mechanisms of m6A modification is a hot topic in cancer research and recent investigations showed that m6A regulators can regulate various tumor progression processes [11,54]. Based on the limitations of previous related studies our group's previous studies reported that the key markers and TME of LGG and GBM are not the same. In previous studies, the research group has found that m6Ascore accurately predicted LGG progression. Thus, in this present study, we carried out a precise analysis of GBM and found that the m6Acore could predict GBM    patients' prognosis and immunotherapy effects. These findings could be used as references to further determine the GBM's etiology and assess more beneficial treatments.

CONCLUSIONS
Our findings showed that m6A modification played pivotal roles in the tumorigenesis and TME infiltration characterization of GBM using data from large cohorts. Our proposed m6Ascore accurately predicted patients' prognosis and potential chemotherapy and immunotherapy responses, providing novel perspectives to deepen our understanding on the pathogenesis of GBM and identify potential targets to improve treatment outcomes.

AUTHOR CONTRIBUTIONS
The work presented here was carried out with the collaboration of all authors. WR Liu and CY Li defined the research topic and discussed the analyses and their interpretation, and presentation. WR Liu, HD Huang and HN Huang developed the algorithm, found clinicopathological data and performed the statistical analyses. WR Liu and CM Liu drafted the manuscript, recorded the clinical data, analysed the m6A events and interpreted the results. CY Li,XY W and CM L participated in reviewing all clinical records and performed the associated data collection. All authors read and approved the final manuscript.

CONFLICTS OF INTEREST
The authors declare no conflicts of interest.

FUNDING
This study was supported by grants from: 1. 2020 Guangxi Zhuang Autonomous Region Health Committee self-funded scientific research project, project number 20201558, 2. In 2020, the general project of high-level talent scientific research project of the Affiliated Hospital of Youjiang Medical College for Nationalities (the young and middle-aged backbone talent project), contract number Y202011702, 3. 2021 Guangxi University's young and middle-aged teachers' basic research ability improvement project, project number 2021KY0542.

Editorial note
& This corresponding author has a verified history of publications using a personal email address for correspondence.