Development and validation of a novel survival model for acute myeloid leukemia based on autophagy-related genes

Background Acute myeloid leukemia (AML) is one of the most common blood cancers, and is characterized by impaired hematopoietic function and bone marrow (BM) failure. Under normal circumstances, autophagy may suppress tumorigenesis, however under the stressful conditions of late stage tumor growth autophagy actually protects tumor cells, so inhibiting autophagy in these cases also inhibits tumor growth and promotes tumor cell death. Methods AML gene expression profile data and corresponding clinical data were obtained from the Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) databases, from which prognostic-related genes were screened to construct a risk score model through LASSO and univariate and multivariate Cox analyses. Then the model was verified in the TCGA cohort and GEO cohorts. In addition, we also analyzed the relationship between autophagy genes and immune infiltrating cells and therapeutic drugs. Results We built a model containing 10 autophagy-related genes to predict the survival of AML patients by dividing them into high- or low-risk subgroups. The high-risk subgroup was prone to a poorer prognosis in both the training TCGA-LAML cohort and the validation GSE37642 cohort. Univariate and multivariate Cox analysis revealed that the risk score of the autophagy model can be used as an independent prognostic factor. The high-risk subgroup had not only higher fractions of CD4 naïve T cell, NK cell activated, and resting mast cells but also higher expression of immune checkpoint genes CTLA4 and CD274. Last, we screened drug sensitivity between high- and low-risk subgroups. Conclusion The risk score model based on 10 autophagy-related genes can serve as an effective prognostic predictor for AML patients and may guide for patient stratification for immunotherapies and drugs.


INTRODUCTION
Acute myeloid leukemia (AML) is a kind of malignant blood cancer, accounting for about 1% of all cancers (Molica et al., 2019;Winer & Stone, 2019;Moors et al., 2019). AML is characterized by impaired hematopoietic function and bone marrow (BM) failure, leading to fatal consequences due to the clonal expansion of undifferentiated myeloid progenitor cells (Cai & Levine, 2019;Hunter & Sallman, 2019;Gill, 2019). Autophagy is an important biological process, vital to survival, differentiation, development, and homeostasis, and can play a very important role in tumors. Under normal circumstances, autophagy can inhibit the early development of cancer (Onorati et al., 2018;Glick, Barth & Macleod, 2010;Mizushima & Komatsu, 2011;Li et al., 2017) by eliminating damaged proteins and organelles and reducing cell damage and chromosome instability. However, under hypoxic or low nutritional conditions, tumors can obtain nutrients through autophagy (Boya et al., 2016;Kim & Lee, 2014;Fan et al., 2020;Parzych & Klionsky, 2014). Recent studies found that inhibiting autophagy effectively inhibits tumor growth and promotes tumor cell death (Luan et al., 2019;Wang et al., 2019;Liang et al., 2020). Moreover, autophagyrelated gene signatures can effectively predict the clinical outcome of pancreatic ductal adenocarcinoma and breast tumors, but the research on autophagy prognostic biomarkers of AML is still insufficient.
In this study, we used AML data from the TCGA database (TCGA-LAML) and the GEO database (GSE37642). We obtained 35 prognosis-related autophagy genes in the TCGA data and used 10 of those to construct a prognostic model and then verified it through the GEO database. Our model had good predictive performance suggests that these 10 autophagy genes may be related to the tumor microenvironment and could provide new insights for the therapeutic strategies and prognosis of AML.
The β in this formula refers to the regression coefficient. The GSE37642 data set was used as a validation 1 cohort. In addition, we further verified the reliability of the prognostic gene signature by randomly dividing the training set (TCGA-LAML) into a verification 2 cohort and a verification 3 cohort. The autophagy risk score of each patient was calculated according to the uniform formula determined in the training cohort. We determine the best autophagy risk scoring standard through the ''survminer'' software package (Walter, Sánchez-Cabo & Ricote, 2015), and then divide the patients into high-and low-risk groups. In addition, we also constructed a prognostic nomogram.

Estimation of immune cell type fractions
The CIBERSORT algorithm is used to estimate the immune cell types of TCGA data (Alaa et al., 2019;Gentles et al., 2015;Newman et al., 2019;Chen et al., 2018).

Generation of immunescore and stromalscore
The ESTIMATE package (Yoshihara et al., 2013) was used to estimate the ratio of immunestromal components in each sample in the tumor microenvironment in the form of two kinds of scores: Immune Score, and Stromal Score, which positively correlate with the ratio of immune and stroma, respectively. Meaning the higher the respective score, the larger the ratio of the corresponding component in the tumor microenvironment.

Functional enrichment analysis
The Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene Ontology (GO) analysis of all differentially expressed genes (DEGs) by R software with p < 0.01 set as the threshold. Gene Set Enrichment Analysis (GSEA software, version 4.0.1) was used to investigate the pathways enriched in the high-risk subgroups. The number of random sample permutations was set at 10.

Statistical analysis
LASSO analysis was performed using the ''glmnet'' package (Engebretsen & Bohlin, 2019;Blanco et al., 2018). The number of folds used in cross-validation was 10. The Timedependent receiver operating characteristic (ROC) curve was used to evaluate the predictive performance of 10-gene features. The area under the ROC curve (AUC) was calculated by using the ''survivalROC'' package (Le et al., 2020;Do & Le, 2020;Li et al., 2021;Le et al., 2021). The decision curve analysis was carried out using the ''rmda'' software package. The ''rms'' software package was used for nomogram and calibration diagrams. We use one-way ANOVA to analyze multiple sets of normalized data. All statistical analyses were performed using R software (version 3.5.1) and GraphPad Software (version 7.00). p < 0.05 is considered statistically significant.

Establishing an autophagy-related model and functional enrichment analysis
Thirty-five autophagy genes were related to prognosis in TCGA (Fig. 1A), and LASSO regression analysis narrowed down the list (Figs. 1B, 1C), to include 10 autophagy genes The GO results indicated that 10 autophagy genes were significantly enriched in the biological process (BP) and cellular components (CC) categories (Fig. 1E), such as positive regulation of protein localization to nucleus, regulation of muscle cell apoptotic process, muscle cell apoptotic process, regulation of protein localization to nucleus, negative regulation of organelle organization, positive regulation of muscle cell apoptotic process, positive regulation of protein import into nucleus, positive regulation of protein import, intrinsic apoptotic signaling pathway in response to oxidative stress, protein localization to nucleus, intrinsic apoptotic signaling pathway, regulation of striated muscle cell apoptotic process, regulation of protein import into nucleus, negative regulation of mitochondrion organization, striated muscle cell apoptotic process, selective autophagy, regulation of protein import, inclusion body, integral component of organelle membrane, intrinsic component of organelle membrane, and nuclear envelope. In addition, it is worth noting that the results of the KEGG analysis did not enrich for obvious pathways.

Evaluation of autophagy risk score
After dividing patients into high-risk and low-risk subgroups, we found an important result that the high-risk group was significantly associated with poor prognosis in the TCGA-LAML cohort (P = 6.975e−09; Fig. 2A). The AUC of the one-, three-, and five-year overall survival (OS) in the TCGA-LAML cohort were 0.819, 0.846, and 0.887, respectively (Fig. 2B). Compared with the other six signatures (Chen et al., 2020), our signature showed a higher C-index (0.7240) and AUCs for one-, three-, and five-year OS predictions (Figs. 2C, 2D).
In order to verify the predictive value of the 10-gene signature, we calculated the risk scores of patients in the GSE37642 cohort (validation 1 set). We found that the results of the GSE37642 cohort were consistent with the results in the TCGA cohort, and the OS of the high-risk group was significantly lower than that of the low-risk group (P < 0.001). The AUCs for one-, three-, and five-year OS were 0.638, 0.553, and 0.532, respectively (Fig.  2F). In addition, we further verified the reliability of the model. We randomly dividing the training set into a verification 2 set (Figs. S1A-S1D) and a verification 3 set (Figs. S1E-S1H), the signature had reliable predictive ability (Fig. S1). Taking this together, the 10-gene signature was capable of predicting OS in AML. The clinical information of the patients was shown in Table S2.

Nomogram analysis results of TCGA-LAML cohort and GSE37642 cohort
In order to better evaluate the relationship between genes and prognosis in the model, we used a nomogram to analyze it. The results show that in the TCGA-LAML cohort, BNIP3, CANX, and WDFY3 have a positive correlation with OS, and BAG3, CDKN2A, DIRAS3, NRG2, PARP1, PRKCD, and VAMP3 have a negative correlation with OS (Fig. 4A). In addition, in the GSE37642 cohort, CANX, CDKN2A, NRG2, and VAMP3 have a positive correlation with OS, and BAG3, BNIP3, DIRAS3, PARP1, PRKCD, and WDFY3 have a negative correlation with OS (Fig. 5A). The calibration plots showed that the nomogram could accurately predict the one-, three-, and five-year OS (Figs. 4B-4D, Figs. 5B-5D) with a harmonious consistency (TCGA-LAML, C-index = 0.72; GSE37642, C-index = 0.66) between the predicted and observed survival.

Significant differences between high-and low-risk subgroups
The patients were scored by autophagy-related gene models, and the patients were divided into high-and low-risk groups based on the optimal score. Principal components analysis (PCA) supports the classification of AML patients into two subgroups (Fig. 6A). In order to further analyze the difference between the high-risk and low-risk subgroups, the ESTIMATE algorithm was used to analyze the TCGA-LAML tumor microenvironment. The results showed that high ImmuneScore was significantly associated with poor survival (Fig. 6B). Another important finding was that ImmuneScore and StromalScore were higher in the high-risk group (Fig. 6C). In addition, age was significantly correlated with both Immune Score and Stromal Score (Fig. 6D).
In order to explore the differences in immune infiltrating cells in the high-and low-risk subgroups, we used the CIBERSORT algorithm to analyze the composition of 22 immune cells in the TCGA-LAML cohort (Fig. S2) and analyzed the correlation between different immune infiltrating cells (Fig. S3). In addition, the difference in immune infiltrating cells between high and low-risk subgroups is shown in Fig. 6E. Further analysis showed that the high expression of mast cells resting was associated with a better prognosis and the NK cells activated with high expression was associated with a poor prognosis (Fig. 6F). PDL1 (CD274) and CTLA4 play a very important role in the immunotherapy of AML. We found that the high-risk group had higher expression levels of PDL1 and CTLA4 (Fig. 6G) (Fig. 6H). The results of drug sensitivity analysis showed that there are significant differences between 24 chemotherapy drugs between high-risk and low-risk patients, which may provide help for personalized treatment of AML patients (Fig. 7).
In this study, we first identified 10 autophagy genes related to AML patients' prognosis from the training group through univariate COX analysis, LASSO regression analysis, and multivariate COX analysis, to establish a risk score model. According to the optimal value of risk score, patients were divided into high-and low-risk subgroups. In the training group, a high-risk score was significantly correlated with poor prognosis (p = 6.975e−09). Then we conducted verification in the GSE37642 cohort, and the results supported that high-risk subgroups were significantly more related to poor prognosis (p <0.001). Next, we tested the accuracy of the model, and the results showed that the predictive performance of the model was good (Figs. 2B, 2F). Interestingly, there was a tendency of shorter survival in patients with higher risks in TCGA data but not in GSE37642 (Figs. 2C,2D,2G,2H). Testing with clinically relevant factors indicates that risk score in our model is an independent factor for AML in both TCGA-LAML and GSE37642 cohorts. Furthermore, the nomogram displayed the correlation between one-, three-, and five-year survival and these genes in the risk model. Among them, CANX, BAG3, DIRAS3, PARP1, and PRKCD are more consistent in both TCGA-LAML and GSE37642 cohorts. This is partly a reflection of the lower efficiency of TCGA-LAML cohort when compared to GSEA cohorts. Additional data could help validate and optimize the model. In addition, we analyzed the relationship between autophagy genes and immune infiltrating cells in the model, and the results showed that the high-risk subgroup had a higher level of StromalScore, ImmuneScore, and certain immune cell types compared to the low-risk subgroup, indicating that the model might have a special immune signature. Moreover, the expression level of immune checkpoint genes (CTLA4 and CD274) in patients with higher risk was higher than low-risk subgroups, suggesting this model provides more information for immune therapies like stratifying patients who are more sensitive for CTLA4 and CD274 immune therapies. Consequently, we xplored the relationship between AML and tumor environment in the TCGA-LAML cohort. We found StromalScore could not predict prognosis but higher ImmuneScore had a slightly better survival while age is a significant factor that influencing Stromal Score and Immune Score in TCGA-LAML cohort. However, for mast cells resting and NK cells activating, subgroups with relatively high-or low level had a significant different survival. Those  DIRAS3, one important gene in our risk score model, is an imprinted tumor suppressor gene that also plays a very vital role in ovarian and breast cancer (Sutton et al., 2019a;Peng et al., 2018;Sutton et al., 2019b). PRKCD is a pro-apoptotic kinase, and some miRNAs can regulate tumors by targeting PRKCD (Zhang, Xu & Dong, 2017;Yao et al., 2015;Ke et al., 2013). VAMP3 is a member of the vesicle-associated membrane protein (VAMP)/synaptobrevin family (Sneeggen et al., 2019;Chen et al., 2019;Pontes et al., 2006;Caronni et al., 2018). Consistent with these studies, our research shows that these genes are potential therapeutic targets for postoperative diseases caused by microglial activation.
However, this study has some limitations. First, our study is mainly based on TCGA data, and most of the patients are white or Asian and we should be cautious to extend our findings to patients of other races. Second, our study is a retrospective analysis, and prospective studies are necessary to verify the results. Third, the AML datasets do not have complete clinical information, which may reduce the statistical validity and reliability. Finally, verification of our model in vitro or in vivo would be beneficial.
Overall, we constructed a prognostic model of 10 autophagy-related genes through the TCGA database and verified them through the GEO database. Our results complement the existing prognostic models and can be used as potential biomarkers for AML. In addition, we provide new views on the role of autophagy genes in AML, and these autophagy genes may also be applied in clinical adjuvant therapy.

ADDITIONAL INFORMATION AND DECLARATIONS Funding
This work was supported by the Medical and health research projects in Hainan Province (Grant No. 2001320243A2009). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.