Construction of a prognostic model for lung squamous cell carcinoma based on seven N6-methylandenosine-related autophagy genes

: Objective: We aimed to construct a novel prognostic model based on N6-methyladenosine (m6A)-related autophagy genes for predicting the prognosis of lung squamous cell carcinoma (LUSC). Methods: Gene expression profiles and clinical information of Patients with LUSC were downloaded from The Cancer Genome Atlas (TCGA) database. In addition, m6A- and autophagy-related gene profiles were obtained from TCGA and Human Autophagy Database, respectively. Pearson correlation analysis was performed to identify the m6A-related autophagy genes, and univariate Cox regression


Introduction
Lung cancer is a common malignancy that affects human health, and non-small cell lung cancer (NSCLC) accounts for 85% of such tumors [1]. NSCLC has two main histological subtypes: lung squamous cell carcinoma (LUSC) and lung adenocarcinoma (LUAD). LUSC, which accounts for 20-30% of NSCLC cases, is characterized by high metastasis and a high recurrence rate, and most patients with LUSC are diagnosed at an advanced stage with poor prognosis [2]. Thus, there is an urgent need to explore prognostic markers and therapeutic targets for LUSC.
Autophagy is an evolutionarily conserved protein degradation process that maintains cellular homeostasis by degrading long-lived proteins and defective organelles [3]. Previous studies have demonstrated that dysregulation of autophagy is closely related to tumorigenesis [4]. However, the role of autophagy in tumor pathogenesis and its impact on prognosis are complex and remain unclear. Autophagy has been reported to involve three pathways: macrophagy, microphagy, and chaperonemediated autophagy (CMA) [5]. Losmanová et al. [6] have found that CMA-related markers such as LAMP2A and HSC70 are poor independent prognostic biomarkers in primary resected LUSC. In addition, Xu et al. [7] have indicated that TRIM29 promotes the proliferation and invasion of LUSC through E-cadherin autophagy degradation. These studies demonstrate that autophagy-related genes play a role in the progression and prognosis of LUSC.
Previous studies highlighted that N6-methyladenosine (m6A) alters the expression and biological function of genes via regulating RNA stability, microRNA processing, mRNA, and mRNA translation, and its dysregulation has been observed in various human cancers [8,9]. Existing evidence revealed that the expression profiles of m6A genes have great prospects in the prediction of NSCLC prognosis, and these genes may be considered as potential therapeutic targets for LUSC [10]. For example, the m6A reader YTHDC2 was downregulated in NSCLC patients and cell lines, and it could predict poor prognosis and promote tumor progression [11]. These studies indicated that m6A methylation-related genes provide more possibilities for the diagnosis and treatment of cancer.
It is known that m6A and autophagy genes are responsible for LUSC prognosis. Notably, prognostic models constructed using a series of biomarkers have great potential for the diagnosis and prognosis of patients with LUSC. Although several studies have established prognostic models of patients with LUSC based on autophagy genes, a prognostic model based on m6A and autophagy genes is lacking. In this study, mRNA expression data and clinical information of patients with LUSC were obtained from The Cancer Genome Atlas (TCGA) database. In addition, autophagy-related genes were downloaded from the Human Autophagy Database (HADb), and m6A genes were extracted from LUSC samples from the TCGA database. Next, the m6A-related autophagy genes were screened using Pearson correlation for further analysis. After Cox regression and LASSO analyses, prognosis-related autophagy genes were selected to construct the prognostic model. Furthermore, an independent dataset (GSE37745) was used to validate the predictive ability of this model. Finally, we explored the association between the prognostic model and the immune microenvironment. A flowchart of the study design is presented in Figure 1. Our study will help to propose targeted treatment strategies for patients with LUSC with certain clinical guidance values.

Screening of expression profile data
Gene expression profiles of lung cancer patients were downloaded from the TCGA database (https://gdc-portal.nci.nih.gov/). A total of 550 samples were obtained, which included 501 LUSC samples. Among them, 494 LUSC patients with LUSC had complete prognostic information, and these samples were used as the training set.
In addition, the relevant datasets that met the following criteria were screened from the National Center for Biotechnology Information Gene Expression Omnibus database (https://www.ncbi.nlm.nih.gov/geo/) using "lung cancer, homo sapiens" as keywords. 1) The samples had histological-type information (such as adenocarcinoma and squamous cell carcinoma), 2) the total sample size was greater than 150 (among these, the number of LUSC samples was not less than 50), and 3) the LUSC samples had complete survival information. After selection, one expression profile dataset that met the requirements, GSE37745 [12][13][14], was downloaded. This dataset included 196 samples, of which 66 had clinical prognostic information. The dataset was constructed based on the GPL570 Affymetrix Human Genome U133 Plus 2.0 Array platform.

Identification of m6A-related autophagy genes in LUSC samples
A total of 232 autophagy-related genes were obtained from HADb (http://www.autophagy.lu/) [10]. The m6A-related gene expression levels from TCGA LUSC tumor samples were extracted, and the expression levels of autophagy-related genes were downloaded for further analysis. In brief, the cor.test function in R 3.6.1 (https://stat.ethz.ch/R-manual/R-devel/library/stats/html/cor.test.html) was used to calculate the Pearson correlation coefficient (PCC) between m6A-related genes and autophagy genes, and the autophagy genes that were significantly associated with the m6A genes were screened. In addition, the m6A-autophagy pairs with |PCC| > 0.5 and false discovery rate (FDR) < 0.05 were used to construct a coexpression network, which was visualized using Cytoscape 3.6.1 [15]. Finally, the Gene Ontology Biological Process (GO_BP) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses of genes contained in the co-expression network were performed using the Database for Annotation, Visualization and Integrated Discovery [16,17] (Version 6.8; https://david.ncifcrf.gov/). A FDR < 0.05 was used as the threshold for the significantly enriched category.

Screening of autophagy genes associated with prognosis
With regard to the TCGA training set, the survival package (Version 2.41-1; http://bioconductor.org/packages/survivalr/) in R 3.6.1 [18] was used to conduct univariate Cox regression analysis based on the data from the selected autophagy-related genes combined with the clinical survival and prognosis information of the samples. A p-value < 0.05 was used to determine prognostic-related autophagy genes.

Development and verification of prognostic model
Based on the prognostic-related autophagy genes, a LASSO Cox regression model [19] of the penalized package (Version 0.9.50) [20] in R 3.6.1 was used to screen the optimal number of autophagy gene markers. Thereafter, the samples were divided into low-and high-expression groups according to the median expression level of each gene. The Kaplan-Meier (KM) curve method in the survival package [18] was used to evaluate the relationship between the two groups and survival prognosis.
Furthermore, according to the LASSO prognostic coefficient of each selected autophagy gene and the gene expression level in the TCGA database, the prognostic score (PS) model was constructed using the following formula: Prognostic score (PS) = ∑βgenes × Expgenes Here, βgene represents the LASSO prognostic coefficient of the autophagy gene, and Expgene represents the expression level of the gene in the TCGA dataset.
Based on the PS formula, the PS values of each sample in the TCGA and validation sets (GSE37745) were calculated. All samples were divided into high-and low-risk groups according to the median PS score. Finally, the correlation between the high-or low-risk groups and prognosis was assessed using KM analysis.

Principal component analysis (PCA) of samples in TCGA and GSE37745 datasets
PCA is a dimensionality reduction technology that can convert multiple variables into a few principal components that reflect most of the original variable information [21,22]. According to the expression levels of m6A genes detected in the TCGA and GSE37745 datasets, PCA was performed on the samples of the two datasets using the psych package (Version 1.7.8) in R 3.6.1.

Correlation analysis of low-and high-risk groups and immune microenvironment
The proportions of 22 immune cell types in the LUSC samples from the TCGA database were calculated using CIBERSORT (https://cibersort.stanford.edu/index.php) [20]. Next, the differences in the proportions of various immune cell types in the high-and low-risk groups were compared via Wilcoxon rank-sum test.

Identification of autophagy genes related to m6A genes in LUSC samples
According to the threshold, a total of 256 m6A-autophagy interactions containing 112 autophagy genes and 18 m6A genes were obtained (Table S1). The m6A and autophagy co-expression networks are shown in Figure 2A. In addition, functional enrichment analyses of the obtained autophagy genes were performed. The results showed that these genes were markedly enriched in 51 GO_BP terms and 56 KEGG pathways (Table S2). Top20 GO terms and KEGG pathways were displayed. In brief, genes were mainly enriched in GO_BP terms such as autophagy, apoptotic process, and macroautophagy ( Figure 2B), and were involved in several KEGG signaling pathways such as the mechanistic target of rapamycin (mTOR) and phosphoinositide 3-kinase (PI3K)/protein kinase B (Akt) signaling pathways ( Figure 2C).

Construction and verification of prognostic model
Based on the 13 prognosis-related autophagy genes and the LASSO Cox regression algorithm results, the optimal number of combined autophagy genes was seven ( Table 1). The lambda values and coefficients of the seven genes are shown in Figure 3A-C. Next, the samples were split into low-and high-expression groups based on the gene expression levels. KM analysis revealed that patients with LUSC with low expression levels of CASP4, CDKN1A, DLC1, ITGB1, and PINK1 had shorter overall survival times, whereas patients with high expression levels of TP63 and EIF4EBP1 had longer survival times ( Figure 3D).
Furthermore, seven autophagy genes were used to establish a prognostic model, and the PS value of each gene was calculated. Five genes (DLC1, PINK1, CASP4, CDKN1A, and ITGB1) with homologous recombination (HR) scores > 1 were considered risk genes that were detrimental to prognosis, while two genes (TP63 and EIF4EBP1) with HR scores < 1 were regarded as protective factors that were beneficial to prognosis ( Figure 4A), which was consistent with the results of the survival analysis. In addition, a heatmap of the gene expression levels showed that DLC1, PINK1, CASP4, CDKN1A, and ITGB1 were upregulated, whereas TP63 and EIF4EBP1 were downregulated with an increase in PS value2 ( Figure 4B). Samples in the TCGA cohort were separated into high-and low-risk groups based on the median PS value. Survival analysis indicated that patients in the highrisk group had worse survival rates ( Figure 5A). As shown in Figure 5B, patient survival was associated with the risk score. Moreover, the area under the receiver operating characteristic (ROC) curve (AUC) of this model was 0.958 ( Figure 5C), suggesting that this model has a good ability to predict LUSC prognosis. We used another dataset (GSE37745) to validate the predictive ability of this model. The PS value of patients with LUSC in the GSE37745 dataset was also calculated. This divided the patients into high-and low-risk groups. Patients with a high PS had a lower survival ratio and shorter overall survival time ( Figure 5D). We also observed that patients in the high-risk group had a higher mortality rate ( Figure 5E). In addition, the AUC value of the ROC curve was 0.759, indicating that the prognostic model had high prediction sensitivity ( Figure 5F).

PCA results
PCA was conducted to compare the differences in m6A expression levels between the low-and high-risk groups. As shown in Figure 6, the low-and high-risk samples in the TCGA and GSE37745 datasets presented different clusters, indicating that there were differences in m6A levels between these two groups.

Association between different risk groups and the immune microenvironment
The proportions of 22 immune cell types in each sample from the TCGA database were calculated (Table S4). A total of nine immune cell types between the high-and low-risk groups presented significant differences, including CD8(+) T cells, M0 macrophages, and regulatory T cells (Figure 7).   Wilcoxon rank-sum test showed the immune cell types with significant differences in low-(gray nodes) and high-risk (red nodes) groups. *p < 0.05, **p < 0.01, and ***p < 0.001.

Discussion
Although advances in the diagnosis and treatment of LUSC have been made over the past several decades, patients with LUSC continue to have a low five-year survival rate of < 20% [23]. Therefore, it is necessary to explore novel prognostic markers and potential therapeutic targets for LUSC. Previous studies has indicated that m6A is the most common and abundant modification that affects the development of cancer by regulating m6A methyltransferase, demethylase, and binding protein; in addition, m6A regulators may serve as potential clinical therapeutic targets for cancers [24]. Meanwhile, autophagy acts as an adaptive response to protect tumor cells from stress, thereby promoting tumor progress, suggesting that autophagy is a promising target for cancer therapy [25]. However, to date, the potential role of m6A regulators and autophagy-related genes in LUSC prognosis is not well understood. In this study, a novel seven-gene prognostic model for LUSC was developed based on the combination of m6A and autophagy-related genes. According to the PS value of the model, patients in the test and validation sets were assigned to the low-and high-risk groups, and the K-M curve showed that patients in the high-risk group were closely associated with poor prognosis. Meanwhile, ROC curves suggested that this model has good predictive ability and specificity for the prognosis of LUSC, with AUC values of 0.958 (training set) and 0.759 (validation set). Furthermore, there was a significant difference in the proportion of the nine types of immune cells between the high-and low-risk groups.
Several studies demonstrated that m6A modification is involved in various biological processes related to cell survival and death, such as immune response, apoptosis, and autophagy [26]; meanwhile, autophagy is an important cellular process that controls cell fate. Thus, the relationship between autophagy and m6A is interesting and worth studying. In this analysis, we identified autophagy genes associated with m6A in the LUSC samples, and functional analysis showed that these genes were mainly involved in some pathways such as the PI3K/AKT and mTOR signaling pathways. Previous study indicated that PI3K/AKT/mTOR pathway was a most commonly activated signaling pathway in cancer, which lead to cell proliferation, differentiation, and survival [27]. Therefore, PI3K/AKT/mTOR pathway inhibitors may be considered as potential target drugs of LUSC [28].
In this study, we screened seven m6A-related autophagy genes (CASP4, CDKN1A, DLC1, ITGB1, PINK1, TP63, and EIF4EBP1) to establish a risk score model with a high predictive ability for the prognosis of LUSC. Survival analysis showed that CASP4, CDKN1A, DLC1, ITGB1, and PINK1 were considered risk genes that were detrimental to prognosis. Caspase-4 (CASP4), a type of inflammatory caspase, plays a host defense role by releasing pro-inflammatory interleukin-1-like cytokines [29]. A previous study indicated that CASP4 was overexpressed in NSCLC tissue and was responsible for poor prognosis; thus, it was suggested as a diagnostic and prognostic biomarker for patients with NSCLC [30]. Integrin subunit beta 1 (ITGB1) belongs to the integrin family and is involved in cell adhesion and recognition. Zhao et al. [31] found that ITGB1 participated in the tumorigenesis mechanism of NSCLC, and downregulation of ITGB1 could inhibit the progression of cancer. Cyclin-dependent kinase inhibitor 1A (CDKN1A) played tumorigenic effect via mediating G1 cell cycle arrest [32]. In NSCLC cell lines and tumor tissues, the expression level of DLC1 Rho GTPaseactivating protein (DLC1) was reduced through promoter methylation [33]. Chang et al. [34] demonstrated that higher PTEN-induced kinase 1 (PINK1) expression levels were associated with poor overall survival of patients with LUAD. Despite few studies reported the direct relationship between gene expression level and LUSC prognosis, these articles indicated the potential carcinogenic effects of these genes in cancer.
Further, we also observed that TP63 and EIF4EBP1 with HR scores < 1 may be regarded as protective factors for LUSC. Tumor protein p63 (TP63) combined with other genes can be used as biomarkers to distinguish LUSC from LUAD; thus, it may be the key to the development of precision medicine for patients with LUSC [35]. Consistent with our results, Zhu et al. [36] also found that the autophagy-related genes such as TP63 and eukaryotic translation initiation factor 4E binding protein 1 (EIF4EBP1) were protective factors and independent prognostic biomarkers for LUSC and LUAD. Taken together, we concluded that these seven autophagy-related genes may be potential therapeutic targets and prognostic biomarkers for patients with LUSC. However, the specific biological function of these genes in the prognosis of LUSC is still unclear, further clinical trials are needed to uncover the mechanism of action of these genes.
Tumor-infiltrating immune cells determine the prognosis and response to immunotherapies of cancer [37]. Thus, we analyzed the significant differences in immune cell infiltration between the lowand high-risk groups. The results showed that patients in the high-risk group had higher infiltration of plasma cell, memory resting CD4 + T cell, resting NK cell, and M0 macrophages. Zhu et al. [38] constructed an immune risk score model that contained memory resting CD4 + T cell could predict the prognosis and recurrence risk of patients with LUSC, indicating patients in high-risk group related to poor outcome. Another cell, resting NK cells, showed different infiltration in normal tissues and LUSC tissues, and several immune cells such as resting NK cells also played an important role in immunologic dysfunction and harmful tumor microenvironment induced by LUSC [39]. Furthermore, high-risk LUSC patients possessed a higher proportion of immune suppressive cells such as M0 macrophages, and these patients had a remarkable poor overall survival [40]. Meanwhile, we also found that the level of CD8 + T cell, naïve CD4 + T cell, regulatory T cell (Tregs), activated NK cell, and resting myeloid dendritic cell was decreased in the high-risk group. Evidences have reported the relationship between these cells and prognosis of LUSC. For example, high proportion of CD8+ T cells was associated with improved prognostic outcome in NSCLC, and Tregs also had implications on clinical outcomes of LUSC [41]. The expression of activated NK cells significantly decreased in tumor micro-environment of lung cancer, which was positively associated with the survival time of patients with lung cancer [42]. These studies further supported our findings, suggesting that differences in immune cell infiltration may lead to different prognoses of patients with LUSC.
This study has some limitations. These results were obtained through bioinformatics analysis, and animal experiments and clinical trials are needed to investigate the roles of these genes in LUSC. In addition, the prognostic model should be further verified in other large cohort studies to assess its reliability.

Conclusions
We developed a novel prognostic model including seven m6A-related autophagy genes (CASP4, CDKN1A, DLC1, ITGB1, PINK1, TP63, and EIF4EBP1) for LUSC. This model had good predictive ability for LUSC prognosis. The identified genes may be useful as biomarkers for regulating autophagy and as therapeutic targets in patients with LUSC.

Conflicts of interest
All authors declare no conflict of interest in this paper.