Clusterization in acute myeloid leukemia based on prognostic alternative splicing signature to reveal the clinical characteristics in the bone marrow microenvironment

Alternative splicing (AS), a crucial post-transcriptional regulatory mechanism in expanding the coding capacities of genomes and increasing the diversity of proteins, still faces various challenges in the splicing regulation mechanism of acute myeloid leukemia (AML) and microenvironmental changes. A total of 27,833 AS events were detected in 8337 genes in 178 AML patients, with exon skip being the predominant type. Approximately 11% of the AS events were significantly related to prognosis, and the prediction models based on various events demonstrated high classification efficiencies. Splicing factors correlation networks further altered the diversity of AS events through epigenetic regulation and clarified the potential mechanism of the splicing pathway. Unsupervised cluster analysis revealed significant correlations between AS and immune features, molecular mutations, immune checkpoints and clinical outcome. The results suggested that AS clusters could be used to identify patient subgroups with different survival outcomes in AML, among which C1 was both associated with good outcome in overall survival. Interestingly, C1 was associated with lower immune scores compared with C2 and C3, and favorable-risk cytogenetics was rarely distributed in C2, but much more common in C1. This study revealed a comprehensive landscape of AS events, and provides new insight into molecular targeted therapy and immunotherapy strategy for AML.

With recent developments in next generation sequencing techniques, molecular evolutionary mechanisms underlying the occurrence and clinical progression of AML are better understood, and could optimize individualized treatment [7,8]. Alternative splicing (AS) is a crucial post-transcriptional regulatory mechanism that converts only 20,000 genes in eukaryotic cells into approximately 95,000 different proteins for generating protein diversity [9]. Recent studies have found that variations in tumor transcriptome due to AS changes and dysregulation of AS were closely related to tumor progression, drug resistance, metastasis and other carcinogenic processes [10]. Additionally, occurrence of some peculiar AS events may represent the phenotypic characteristics of specific malignancies. Ghigna et al. identified that the exon skip events of MST1R was related to acquisition of cellular motility during cell invasion [11]. Bechara et al. suggested that changes in an exon of NUMB can activate proliferation of lung cancer cells [12]. Nevertheless, the molecular mechanism of AS mainly depends on the regulation of splicing factors [13]. Tumor genome sequencing data demonstrated that multiple mutations have been found in spliceosomal complex members, while common in hematologic malignancies and usually occurs in SRSF2, SF3B1, ZRSR2 and U2AF1 [14]. Therefore, a good understanding of the potential function of AS could help researchers to identify new oncogenic mechanisms and therapeutic targets.
In recent decades, significant advances have been achieved in the field of tumor immunotherapy [15,16]. Bone marrow microenvironment has been identified as a major sanctuary of leukemic stem cells (LSCs) to protect these cells from conventional therapies, immune cells, hematopoietic cells and some cytokines, chemokines, etc. [17]. Moreover, increasing evidence suggested that AS is another hallmark in immune microenvironment formation [18]. Studies have shown that splicing changes can be used to distinguish different types or subtypes of tumors and are related to clinical stages and prognosis [19,20]. Therefore, better evaluation of bone marrow microenvironment and distribution and function of splicing events are essential to improve the efficacy of immunotherapy.
Currently, RNA-seq for detecting AS events is frequently used, and specific implementation focuses on bioinformatics analysis of data [21]. With rapid accumulation of sequencing data, public databases provide wealthy resources for systemic biology. In this study, we systematically assessed genome-wide AS patterns and evaluated their associations with clinical outcomes of AML. Furthermore, we discerned distinct clusters of AML based on survival-associated events and investigated the relationship between clusters and clinical characteristics of bone marrow immune microenvironment. Our study revealed a comprehensive landscape of AS events, and provided new insight into molecular targeted therapy and immunotherapy strategy for AML.

Overview of AS events in AML
The profile of AS events for 178 patients with AML were analyzed from TCGA cohort. The clinical and molecular characteristics of these cases were summarized in Additional file 1: Table S1. The median age at diagnosis was 55 years (range, 18 to 88), and the median follow-up duration was 16.4 months (range, 1 to 118).
After preprocessing procedure, integrated RNA-Seq data with AS events in seven splicing types were included in the present study. We detected a total of 27,833 AS events of 8,337 genes, comprising 2,136 AA in 1641 genes, 1724 AD in 1367 genes, 5588 AP in 2760 genes, 6317 AT in 3173 genes, 9985 ES in 4762 genes, 135 ME in 133 genes, 1948 RI in 1335 genes, as illustrated in Fig. 1a. A schematic diagram of AS events are shown in Fig. 1b. It should be noted that several mRNA splicing events may be detected in a single gene, and up to 6 types of AS events were observable for one gene (Fig. 1c). In addition, most genes have more than one type of AS events. ES was the predominant type of AS events in AML (36.12%), and these events may provide perhaps a critical process for enriching transcriptome diversity.

AS events associated with prognosis of AML
We performed univariate Cox analysis to evaluate the impact of AS type on OS in AML patients. A total of 3016 survival-related AS events within 1992 genes were identified. Hazard ratios (HRs) greater than or less than 1 accounted for 2.74% and 5.88% of the total AS events, respectively (Additional file 2: Table S2). Meanwhile, several survival-associated AS events can also be detected in one single gene, and different spliceosomes of a few parent genes (such as NBPF11, LRRC23, etc.) exhibited opposite prognostic effects in AML (Additional file 3: Figure S1). Top 20 most significant survival-associated AS events of each type were presented in Fig. 2.
Subsequently, independent prognostic AS events for seven types were used to construct prognostic predictors models. LASSO analysis was used to screen the optimal combination (Additional file 9: Figure S3). The Kaplan-Meier analysis indicated that these molecular signatures of AS events can be used to differentiate patients with distinct prognosis. To compare the efficiency of eight predictive models, ROC curves were applied into each model. The AUC of AA, AD, ES, AT, AP, RI, ME and marge-AS models was 0.912, 0.875, 0.834, 0.829, 0.820, 0.819, 0.738, and 0.867, respectively. AA was related with better predictive performance in AML, showed in Additional file 4: Figure S2. Detailed data of models based on each type of AS signature is listed in Additional file 5: Table S3.

Clinical characteristics of bone marrow immune microenvironment
To evaluate the relationship between AS and bone marrow microenvironment, we analyzed immune scores of AML patients by using a bioinformatics tool. Immune scores ranged from 1645.34 to 4145.87, and stromal scores form −1809.22 to 333.04. To assess the correlation of OS and DFS with scores, we divided patients into low-and high-score groups using the median of immune/stromal scores as cutoff. As shown in Fig. 3, the survival distribution curves showed that high immune scores correlated with worse OS (P value = 0.0215) and DFS (P-value = 0.0058) than low immune scores (Fig. 3a,  b). However, patients with lower stromal scores only had numerically longer OS (P-value = 0.6896) and DFS (P-value = 0.1138) (Figure c, d). Since previous studies have indicated that immune conditions are associated with clinical characteristics, we investigated the relationships of gender, age and cytogenetic risk classification with immune and stromal scores. Immune scores were greater in the elderly group (P-value = 0.0013), while stromal scores were independent of age (P-value = 0.1196) and gender (P-value = 0.5156) (Fig. 3e-h). Intriguingly, the average immune scores of the favorable cytogenetics subtypes ranked the lowest in risk classification, which is consistent with findings above (Fig. 3i, k). To reveal the biological basis involved in different bone marrow

Network of survival-associated AS events and splicing factors
Splicing factors (SFs) are the key players of AS events and promote differential splicing patterns under stress conditions [22]. To understand the regulatory pattern of SFs in AML, we constructed the interaction networks of OS-related AS events and SFs (Fig. 4). A total of 117 AS events including 56 AS events with inferior prognosis (red dots) and 61 AS event with protective activity (green dots) were identified and correlated significantly with the levels of SF expression (purple triangle). Figure 5a shows several examples of the relationship between SF expression levels and PSI values for AS events, with more details provided in Additional file 6: Table S4. We observed that many SFs were correlated with different type of AS events and may have opposite effects, indicating that SFs may negatively (green line) or positively (red line) regulates OS-related AS events. Similarly, partial AS events could be regulated by multiple SFs simultaneously, this phenomenon partly explains the multiple AS events that can occur in the same transcript.
Since abnormal expression of SFs in cancer is a potential mechanism for regulating AS events, we analyzed the relationship between the observed changes in SFs expression and promoter methylation. Among the 89 SFs, the promoters of 72 SFs were highly hypermethylated and negatively correlated with their corresponding mRNA expression in AML patients. Figure 5b shows some highly relevant examples including FAM50B, SRSF8, HNRNPF, CPSF6, etc., with more details provided in Additional file 7: Table S5. In addition, we observed that 19.46% of cases in the AML cohort had at least one SF copy number variation. Circos plot shows the chromosome position information of partial SFs (Fig. 5c). This result revealed that SFs could be regulated by epigenetics, and further increase the diversity of AS events.

AS-based clustering was associated with clinical characteristics and immune features
Our research observed that the PSI values of each AS event varied among AML individuals. To better understand the molecular heterogeneity of AML, we explored patterns of AS based on survival-associated events by performing consensus unsupervised analysis of all samples. As a result, three clusters of samples were determined ( Fig. 6a, b): C1 (n = 60, 33.7%), C2 (n = 29, 16.3%), C3 (n = 89, 50.0%). To discern the characteristics of different subtypes, we first investigated the relationships between AS clusters and immune microenvironment. The results showed that C1 was associated with lower immune scores compared with C2 and C3 (C1 vs. C2, P-value = 0.0188; C1 vs. C3, P-value = 0.0463), see Fig. 6c. However, no significant difference was found between clusters and stromal scores (Fig. 6d). Additionally, we compared clinical characteristics (gender, age, cytogenetics risk category, FAB subtype) between clusters. As showed in Fig. 6e, favorable-risk cytogenetics was rarely distributed in C2 (3.57%), but much more common in C1 (30.0%). To explore the association between AS clusters and immune features, the landscape of 22 immune cells abundance within the bone marrow microenvironment of each AML case were plotted. Furthermore, Kaplan-Meier analysis was performed to assess the relationship between clusters and survival status (OS and DFS). The results suggested that AS clusters could be used to identify patient subgroups with different survival outcomes (Fig. 6f, g), among which C1 was both associated with good outcome in OS (C1 vs. C2, P-value < 0.0001; C1 vs. C3, P-value < 0.0001) and DFS (C1 vs. C2, P-value = 0.0159; C1 vs. C3, P-value = 0.0338) analysis, followed by C3 and C2. The overall median survival for clusters (C1-C3) was 31.5 months, 7.16 months, and 14.13 months, respectively.
To further reveal the molecular characteristics of samples with AML in TCGA cohort, we performed a comprehensive molecular analysis of the mutation pattern SNPs, mutation information of each gene was exhibited according to different classified categories. Among all variant classification, missense mutations accounted for the largest proportion, followed by frame shift mutation and nonsense mutation (Fig. 7a); insertion or deletion occurred less frequently than single nucleotide polymorphism (Fig. 7b). The C > T transversions was the predominant of single nucleotide variants (SNVs) in AML (Fig. 7c). The most frequently mutated genes were exhibited in Fig. 7d, e, and the top ten includes DNMT3A, NPM1, TP53, KIT, RUNX1, FLT3, IDH2, WT1, TTN, and IDH1. Based on the AS events clustering, we found that the distribution of common mutated genes was inconsistent in different clusters (Additional file 8), the mutation frequency of DNMT3A in cluster C1-C3 was 10%,   13.8% and 11.2%, respectively (Fig. 7f ); and NPM1 was 5%, 13.4% and 10.1%, respectively (Fig. 7g). Notably, no TP53 mutations were detected in C1 (Fig. 7h), which might be related to the good prognosis of C1 or the insufficient number of TP53 mutations samples. These results indicate that distinct patterns of AS are associated with different molecular characteristics.
Immune checkpoint inhibitors are considered a promising treatment strategy in AML [23]. We investigated the relationship between the expression of immune checkpoints and AS clusters. As shown in Fig. 8a, CD279, CD276, CD27 were significant different in expression distribution among the three clusters of samples. To explore the immune characteristics in bone marrow microenvironment, we generated a bar chart to illustrate the distribution of 22 immune cells in each sample by CIBERSORT algorithm (Fig. 8b). The results revealed that the most representative cell composition in bone marrow microenvironment of AML patients were monocytes, T cells CD4 memory resting, mast cells resting, B cells naive, and eosinophils. We observed that the composition of mast cells resting and B cells naive were slightly higher in C2 than C1 or C3. Collectively, these findings suggested that AML displayed distinct patterns of survival-associated AS events, and splicing events are ubiquitous and influence clinical outcome. Our findings provide new insight into molecular targeted therapy and immunotherapy strategy for AML.

Discussion
AML is the most common hematologic malignancy, with multiple molecular subtypes and cellular heterogeneity [24]. Over the last decade, significant efforts have been made to elucidate the molecular changes in genome-wide profiling involved in AML oncogenesis. Such studies have contributed to the determination of relevant biomarkers and even therapeutic targets, including protein coding genes, microRNAs and long non-coding RNAs [25][26][27]. In addition, as a crucial post-transcriptional regulatory mechanism in expanding the coding capacities of genomes and increasing the diversity of proteins, AS events have been shown to have the potential in predicting clinical outcomes.
Our whole workflow is shown in Fig. 9, this study revealed a comprehensive landscape of AS events and explored the relationship between splicing patterns and the prognosis of AML patients. A total of 27,833 AS events were detected in 8337 genes, of which ES events were the predominant type, accounting for more than 1/3 of the total AS events. Subsequently, we performed the univariate analysis for OS to evaluate the prognostic impact, and found that approximately 11% of AS events were significantly related to the clinical outcomes in AML. We found that several AS events from the same parent gene had the opposite prognosis effects. Among the survival-related AS events, AA was found to have the optimal performance in the prediction models with high classification efficiencies. AS event changes have been widely recognized in tumors that affect the protein interaction networks regulated by SFs [22,28]. We constructed SF correlation networks to elucidate the potential mechanism of splicing pathway, and found that SFs influence disease progression by regulating the AS of downstream target genes simultaneously. Moreover, we calculated microenvironment scores by ESTIMATE algorithm. The results showed that a high immune score was an unfavorable prognostic factor for AML, and younger groups or favorable cytogenetic subtypes had lower immune scores. This is consistent with the findings of current research and supports the reliability of our results [29]. In addition, we also determined the association of AS events with immune/stromal scores to reveal the pathways involved in low-and high-immune groups. More importantly, due to the biologic and genetic heterogeneity of AML, three AS-based clusters were identified (C1-C3). Interestingly, we found that the AS-based clusters presented distinct immune features, and the outcome of survival changed accordingly, just as the C1 subgroups with low immune scores having poor clinical outcome (OS and EFS). To further reveal the association between molecular characteristics and ASbased clusters, we performed a comprehensive analysis of the mutation pattern SNPs, and investigated the relationship between the expression of immune checkpoints and AS clusters, which may be helpful in determining the ideal candidates for treatment and the optimal immunotherapy strategy.
At present, a growing number of researchers are working to develop therapies for AS. Neoantigens derived from AS have greatly expanded the pool of tumor-specific target antigens [30]. Although the identification, screening and clinical use of neoantigens still face various challenges, neoantigens with tumor tissue specificity and high immunogenicity are expected to benefit more patients as potential targets for cellular immunotherapy [31,32]. Previously, TCGA as the principal guide to understanding of the complex tumor biology [33], large-scale sequencing data were used to identify prognostic AS events and revealed pathogenesis through regulatory splicing networks in colorectal cancer and head-neck carcinoma [34,35]. More and more AS-related prognostic models have been established, showing excellent efficacy in various cancers. Researcher have used splicing changes to construct predictive models for different types of tumors, such as hepatocellular carcinoma, glioblastoma, cervical cancer, sarcoma, and esophageal carcinoma [36][37][38][39][40]. Interestingly, AML has been identified to share some common prognostic markers of AS with other tumors. Chen et al. demonstrated the survival-related AS events based on TCGA database, and they provided an overview of misregulated AS events in different types of AML [41]. In the meantime, Jin et al. reported that the AUC of ROC curve for the final prediction model constructed with 15 AS events was 0.931, which might refine risk stratification of the European Leukemia Net (ELN) [42]. But they did not explore the association between the AS events and the clinical characteristics of AML patients. This work was actually greatly increases the confidence in this important breakthrough [43]. Another argument for AS events was critical to improve the understanding of tumor resistance as the destruction of the activity of the splicing factor SRSF3 could lead to drug resistance in the immunotherapy of leukemia [44]. Although a large number of tumor-specific AS events can be obtained through transcriptome data analysis, not all splicing events can be translated into proteins. Further analysis is required to determine the presence of splicing isomers by protein spectroscopy. Currently, there are two application strategies for AS events as a therapeutic target, one being to design an antisense nucleotide for specific AS events to restore the phenotype of normal cells [12,45], and the other being to apply small-molecular compound to regulate SFs [46]. However, splicing itself can also be used as a direct drug target, MET exon14-skipping have been found in non-small-cell lung cancer patients who are sensitive to MET targeted therapy [47]. Moreover, these therapeutic strategies may have different effects depending on the tumor type and SFs expression or mutation [14,22]. Our study reflects some manifestations of AML, we have highlighted focused on subtype clustering of AML patients with AS events. For adoptive immunotherapy, the tumor specificity of antigen

Conclusions
Our study analyzed the landscape of AS events and identified their association with clinical outcome. Our findings support the notion that splicing events are ubiquitous in AML patient. SFs correlation networks further altered the diversity of AS events through epigenetic regulation and clarified the potential mechanism of the splicing pathway. Moreover, clustering based on prognostic AS signature revealed the clinical characteristics of bone marrow immune microenvironment. This discovery provided new insight into molecular targeted therapy and immunotherapy strategy for AML.

Data curation process
RNA sequencing profiles and corresponding clinical information were available at TCGA portal data (https ://porta l.gdc.cance r.gov/). Meanwhile, we evaluated the RNA splicing patterns from 178 AML patients using SpliceSeq software (https ://bioin forma tics.mdand erson .org/TCGAS plice Seq/) and generated the AS profiles of genes for each patient [48]. The ratio between reads including or excluding exons, also known as Percent Spliced In (PSI) value, was calculated for each detected AS events [49]. Seven different types of splice events were identified including Alternate Acceptor site (AA), Alternate Donor site (AD), Alternate Promoter (AP), Alternate Terminator (AT), Exon Skip (ES), Mutually Exclusive Exons (ME), and Retained Intron (RI). We set strict selection criteria (Percentage of samples with PSI value ≥ 0.75, and average PSI value > 0.05) to generate a reliable set of AS events. To describe an AS event precisely, every AS event was assigned a unique annotation with gene symbol, the ID number from SpliceSeq database and splicing pattern.

Prognostic signatures for alternative splicing events
We used Perl language (https ://www.perl.org/; version 5.30.0) to extract the matrix file and match array with clinical follow-up data. Univariate Cox regression analysis was performed to screen prognostic related AS events in AML. The z-score is positive if the value lies above the mean, and negative if it lies below the mean [50], and bubble plots were used to visualize data. The interactive sets between seven types of AS were drawn by the 'UpsetR' package in R 3.5.2 software (https ://www.r-proje ct.org/). The major codes used in this study could be found in the Additional file 8. The least absolute shrinkage and selection operator (LASSO) regression were performed to screen the most significant AS events, and all variables left after elimination are considered selected [51]. We performed the penalty parameter lambda by the cross-validation using the R package 'glmnet' . The optimal lambda value corresponding to the minimum value of the cross-validation error mean was screened to determine the potential survival-related AS events (Additional file 9: Figure S3). The optimal survival-related AS events in each AS types were used to constructed predictive models by using multivariate cox regression. Furthermore, AML patients were divided into two groups by using the median risk score, the formula is as follows: RiskScore = PSI of AS1*β1AS1 + PSI of AS2*β2AS2 +… PSI of ASn*βnASn. Kaplan-Meier (K-M) survival analysis, receiver operating characteristic (ROC) curves and the area under the curve (AUC) were used to assess the predictive accuracy for each prognostic signature. The P-value was computed using log-rank test.

Splicing factor regulated network construction
A list of 382 splicing factors(SF) genes were referenced from a previous study and relevant databases [52,53]. The expression profiles of SFs were obtained from TCGA and normalized by using all housekeeping genes with log base 2 transformed. Pearson correlation test was used to evaluate the correlation between SFs expression and PSI values of survival-associated AS events, and P-values were adjusted by Benjamini & Hochberg (BH) correlation (P-value < 0.05, |Pearson's coefficient| > 0.65). Cytoscape (version 3.7.1, http://www.cytos cape.org/) is a helpful tool for visualizing molecular interaction network and observing the correlation between molecules, we used to plot the AS regulatory network. The methylation beta values of samples measured by the infinium human methylation 450 K platform were downloaded from TCGA portal. Moreover, correlation analysis was performed between SFs methylation and SFs mRNA expression, and the copy number variation (CNV) of SFs was calculated in AML.

Immune characteristics analysis
The immune and stromal scores of each sample were calculated by applying the ESTIMATE algorithm [54] to assess bone marrow immune activity, and AML cases were divided into low-and high-score groups according to the median of immune/stromal scores. Gene set enrichment analysis (GSEA, version 4.0.1, http://softw are.broad insti tute.org/gsea/) is a knowledge-based method which determines whether a particular set of functionally related genes shows statistically significance, and we used it to verify the differences in molecular pathways between low-and high-immune score groups [55]. The 'c2.cp.kegg.v7.1.symbols.gmt' was selected as the database of gene-sets and 1000 permutations were performed to generate the null distributions. In order to evaluate the bone marrow microenvironment in AML patients, we estimated the abundance of various types of immune cell by using the CIBERSORT algorithm [56], which is a versatile computational method for quantifying 22 immune cell types from bulk tissue transcriptomes. Each sample in the data set were performed 100 times for further study, and samples with a P-value < 0.05 was set as the cutoff.

Evaluation of the correlation with clinical features
To visualize the associations between survival-associated AS events and the heterogeneity of AML patients, we clustered the AML into different groups with hierarchical consensus clustering by using the R package 'Consensus Cluster Plus' [57]. In our study, the cumulative distribution function (CDF) curve was used to determine the best number of clusters, parameters were set as default except that max K was set at 9. We selected three sub-types (k = 3, C1-C3) that optimally fit the data. The association between clinical variables and sub-types, including gender, age, cytogenetics risk category, French American British (FAB) subtype, and bone marrow immune features, was analyzed. Kaplan-Meier curves and log-rank test were used to compare the overall survival (OS) and disease free survival (DFS) of different subtypes. Moreover, single nucleotide polymorphism (SNP) and molecular alteration of AML were described by using R package 'maftools' (version 2.2.10). Molecules with high mutation frequencies (DNMT3A, NPM1, and TP53) were included in the evaluation subtype.

Statistical analyses
GraphPad Prism (version 8.02) and R software (version 3.5.2) were used for analysis. Pearson' s correlation test and spearman' s correlation test were employed in the correlation analysis. The Student's t-test was utilized to compare the differences in variables between groups, and P-value less than 0.05 were deemed statistically significant.