A machine learning model identifies M3-like subtype in AML based on PML/RARα targets

Summary The typical genomic feature of acute myeloid leukemia (AML) M3 subtype is the fusion event of PML/RARα, and ATRA/ATO-based combination therapy is current standard treatment regimen for M3 subtype. Here, a machine-learning model based on expressions of PML/RARα targets was developed to identify M3 patients by analyzing 1228 AML patients. Our model exhibited high accuracy. To enable more non-M3 AML patients to potentially benefit from ATRA/ATO therapy, M3-like patients were further identified. We found that M3-like patients had strong GMP features, including the expression patterns of M3 subtype marker genes, the proportion of myeloid progenitor cells, and deconvolution of AML constituent cell populations. M3-like patients exhibited distinct genomic features, low immune activity and better clinical survival. The initiative identification of patients similar to M3 subtype may help to identify more patients that would benefit from ATO/ATRA treatment and deepen our understanding of the molecular mechanism of AML pathogenesis.


INTRODUCTION
Acute myeloid leukemia (AML) results from the clonal expansion of hematopoietic precursor cells with disease-causing genetic mutations or chromosomal changes.2][3] Acute promyelocytic leukemia (APL) is a distinct subtype of AML characterized by the expansion and accumulation of leukemic cells that are blocked at the promyelocytic stage of granulocyte differentiation, as well as the presence of a specific disease driver fusion gene encoding the PML/ RARa oncoprotein. 4An atlas of PML/RARa direct targets has been identified, which redefined the activating function that acted through super-enhancers and explained synergism of ATRA/ATO. 5Morphologically, APL is recognized as M3 subtype of AML by the French-American-British classification.7][8] Among various subtypes of AML, M3 subtype has the highest survival rate, 9 which is attributed to the combination therapy of ATRA and ATO.ATRA and ATO trigger degradation of PML/RARa, thereby inhibiting disease progression, while non-M3 AMLs have a mixed response to this combination therapy. 10,11We would like to explore which non-M3 AML patients may benefit from ATRA and ATO combination therapy through in-depth study of PML/RARa target genes.
In the genetics of myeloid tumors, chromosomal translocations usually involve transcription factors (TFs), which lead to abnormal regulation of downstream target genes by oncogenic fusion TFs, induce malignant cells proliferation, and interfere with bone marrow differentiation. 12As the most important oncogenic fusion in APL, both PML and RARa are TFs, and can directly trans-activate some essential oncogenes, which play important roles in disease progression of APL. 13,14On the other hand, PML/RARa can also suppress the expression of some tumor suppressor genes.For example, PML/RARa inhibits PU.1-dependent activation of immune subunits, thereby contributing to the escape of APL cells from immune surveillance. 15,16Both ATRA and ATO directly target PML/RARa-mediated transcriptional repression and protein stability.0][21] These genes are important for APL cell differentiation or proliferation.In M3 subtype patients, the combination therapy approach has a synergistic effect on the induction of myeloid differentiation and apoptosis.3][24][25] Overall, the successful usage of the ATRA-ATO combination therapy has good therapeutic efficacy and low drug resistance for M3 subtype patients and certain non-M3 AML cell lines.These results suggested that besides M3 subtype, other AML patients with similar expression patterns as M3 subtype ones might also benefit from the ATRA-ATO combination treatment strategy.The good therapeutic efficacy of ATRA-ATO might be not only dependent on the fusion event of PML/RARa, but also closely correlated with the expression features of its target genes.
Besides the PML/RARa fusion event, the expression or genomic patterns of several genes also aid to subtype characterization and treatment of AML.For example, the gene FLT3, which is mutated in approximately 40% of human APL cases, cooperates with PML/RARa in the development of the APL phenotype in mouse. 26As another example, the expression of peptidyl-Prolyl cis-trans isomerase Pin1 is significantly increased in patients with various AMLs, including M3 subtype, which is discovered to be involved in a variety of cancer pathways in AML. 27Given these gene mutations and expression alteration, we believed that in addition to PML/RARa fusion events, other molecular alterations might also be involved in occurrence and progression of APL or AML.At present, the transcriptome characteristics of APL need to be further studied.Therefore, we believe that gene expression profiles can be used to explore similarities between M3 subtype and other AML subtypes, and those AML subtypes that are similar in gene expression to M3 subtype might deserve the same treatment strategy.
In this study, we found that PML/RARa targets tend to be differentially expressed in multiple AML subtypes and contribute to the classification of M3 subtype.Because the expression of PML/RARa targets is the downstream consequence of PML/RARa regulatory mechanisms, we hypothesized that our computational approach may aid to the current classification of M3 subtype from the view of transcriptome and further discover additional subpopulation therapeutic loopholes, which in turn could help identify pathogenic mechanisms.Therefore, an enrichment-based scoring index, defined as M3-Like Score (M3-LS), was developed to assess the expression pattern similarity of PML/ RARa target genes in patients from non-M3 AML populations as M3 subtype.We further developed a classifier for identifying patients similar to M3 subtype with scores above a threshold according to Receiver operating characteristic curve (ROC) analysis.Moreover, by further requiring PML/RARa targets responded to ATRA/ATO or differentially expressed in M3 subtype, the performance of our classifier was improved.The robustness of our model was further validated in other independent AML populations.Notably, expression patterns of several vital marker genes in M3-like patients were discovered to be more concordant with M3 subtype.Moreover, we found that M3-like patients exhibited several features distinct from other non-M3 ones, including genomic mutation, molecular immune features, as well as survival prognosis.All these results indicated the need for identification of M3-like subtype based on transcriptome analysis, suggesting that these samples may also benefit from ATRA/ATO therapy.

PML/RARa targets are perturbed across AMLs and help identify M3 subtype
We respectively obtained 363 and 424 PML/RARa target genes that were significantly repressed and activated in M3 subtype from a previous study 5 by integrating the transcriptome and regulation of PML/RARa in NB4 cell line (Figure 1A).Moreover, differential expression analysis was performed by comparing the expression of patients in different subtypes with normal samples (FDR <0.05, |FC| > 1.5) in the training AML cohort.We next evaluated whether PML/RARa target genes are likely to be enriched in these differentially expressed genes based on hypergeometric tests.As a result, we found that PML/RARa target genes were significantly enriched in M3 subtype (p < 1.55e-13), and approximately $22.62% targets were significantly abnormally expressed (Figure 1B).Notably, we found that the enriched p value in M3 subtype was most significant, and the significant enrichments were also observed in other subtypes (Figure 1B).When changing the thresholds of differential expression analysis, we obtained similar results (Figures S1A and S1B).Taking the target gene WT1 as an example, Figure 1C showed the effect of PML/RARa on directly activated gene WT1 (Figure 1C).It was significantly over-expressed in both M3 (FDR <2.84e-62, FC = 6.73) and several other subtypes (Figure S1C), which is known as a significant predictor of AML recurrence 28 as well as an important marker for detection of AML minimal residues. 29We also found that the frequency of WT1 mutation in the validation cohort-1 was 6.8%, and it was 4.5% in the M3 subtype.Only 0.25% of the target genes were differentially expressed between WT1 mutation group and the wild-type group (Figure S2).These results suggest that WT1 mutation has no significant effects on the expression of PML/RARa targets.STAB1 as another example was also significantly up-regulated in M3 subtype (FDR <7.84e-89, FC = 5.46, Figure S1D), and reducing expression inhibits the growth of NB4 leukemia cells. 30STAB1 was also a poor prognostic factor in AML, and the oncogenic functions have been confirmed in melanoma. 31o further understand the functional roles of PML/RARa target genes, we next performed functional enrichment analysis (FDR <0.05).We found that these differentially expressed targets across AML subtypes were significantly enriched in myeloid cell differentiation, activation, and immune regulation-related functions (Figure 1D).For example, differentially expressed PML/RARa target genes in M3 subtype were significantly enriched in leukocyte migration, myeloid leukocyte activation and T cell activation (Figure 1D).We further explored whether the expression patterns of PML/RARa targets can help distinguish patients in M3 subtype from other subtypes.We performed the tSNE dimensionality reduction and found that almost all patients in M3 subtype were not only clustered together, but also obviously distinguished from other subtypes (Figures 1E, S1E, and S1F).These observations suggested that the PML/RARa targets exhibited M3 subtype specific expression patterns and could help identify more M3 patients.Moreover, we found that certain samples from other subtypes were clustered together with patients of M3 subtype, implying that these patients have more similar expression patterns as M3 subtype, although they do not have the PML/RARa fusion event.
Together, these results suggested that the expression of PML/RARa targets was likely to be perturbed in multiple AML subtypes and the expression patterns of PML/RARa targets can greatly help identify patients similar as M3 subtype.

M3-LS model accurately predicts M3 subtype in AML
We next hypothesized that if PML/RARa-activated target genes were more likely to be upregulated in a patient, whereas repressed target genes were likely to be downregulated, the patient was more similar to M3 subtype.A computational model, M3-LS, based on the expression pattern of PML/RARa targets was developed to predict patients of M3 subtype.We next applied M3-LS model to the training AML cohort (Table 1), and found that the M3-LS can accurately distinguish patients in M3 subtype from other subtypes with an AUC 0.813 (Figure 2A).Next, we also trained random forest and XGboost models using the M3-LS as features in the training cohort, and the AUCs of two classifiers reached 1.00 and 0.979 (Figure 2A), respectively.The sensitivity reached 0.841 when the normalized M3-LS was 0.560.Based on this cutoff, we predicted M3 patients, and 73% of patients of M3 subtype were successfully predicted (Figure 2B).In addition, the normalized M3-LS in patients of M3 subtype were the highest compared to other subtypes (Figure 2C, p < 1.9e-15, Wilcoxon's rank-sum test).
Moreover, the M3-LS model was found to successfully distinguish patients of M3 subtype from other subtypes in two independent AML cohorts assayed by different platforms.In the first validation cohort, the AUC scores respectively reached 0.852, 0.797, and 0.809 for three classifiers (Figure 2D).Similarly, approximately 75% of patients of M3 subtype were successfully predicted (Figure 2E), and their scores were also significantly higher than those in other subtypes (Figure 2F, p < 3.5e-08, Wilcoxon's rank-sum test).Our model was also validated in the second cohort (Figures 2G-2I).Thus, these results indicated that M3-LS model integrating the expression patterns with regulation information could accurately predict M3 subtype in AML from the view of transcriptome.That is in addition to the genomic event of PML/RARa fusion, perturbed expression patterns of its target genes can also reflect the molecular signature of M3 subtype.

Performance of M3-LS model is improved by integrating ATO/ATRA response genes
The combination of ATO and ATRA is a landmark treatment regimen in M3 AML. 32,33An increasing number of studies have also revealed that the treatment process can alter the expression of PML/RARa target genes, and subsequently perturb the downstream biological functions. 10,20,22We next explored to what extent the M3-LS model can be refined by integration of ATO and ATRA treatment datasets (Table 2).We first obtained drug-response genes after treated with ATO or ATRA, as well as abnormally expressed genes in M3 patients.There were 448/414 genes significantly down/up-regulated by ATRA treatment (Figure 3A).In addition, 61 genes were detected to respond to the treatment of ATO, and 671/407 genes were down-regulated/up-regulated in M3 AML when compared with normal samples (Figure 3A).Combined with the above gene set of PML/RARa targets, we obtained 109 refined target genes, including 61 activated and 48 repressed genes (Figure 3A).Then, the M3-LS model was re-trained based on these refined PML/RARa target genes, we found that M3 patients in the training cohort can be distinguished from other subtypes with higher accuracy (AUC = 0.965, Figure 3B).In particular, the AUCs of the refined random forest and XGboost classifiers respectively reached 1.00 and 0.999 (Figure 3B).Approximately 86.89% of M3 patients were successfully predicted, and their scores were significantly higher than other subtypes (Figure 3C, p < 2.22e-16, Wilcoxon's rank-sum test).The robustness of our M3-LS model was evaluated from three aspects.First, we randomly used 10%-100% patients to train the model and evaluate the effects of sample size.It was found that our model can reach high AUCs in different numbers of patients (Figure 3D), even in a small number of patients.Second, considering the relatively large size of non-M3 patients compared with M3 patients, we next randomly selected the same number of non-M3 patients as M3 to eliminate the imbalance effects, and these processes were repeated 1000 times.Our model can also obtain higher AUC values ranging from 0.95 to 0.98 (Figure 3E).Finally, the great improvements of our models were discovered in other two validation cohorts (Figures 3F-3I) and the AUC values reached up to 0.99 and 0.939 respectively (Figures 3F and  3H).Similarly, M3 patients exhibited significantly higher normalized M3-LS than other patients (Figures 3G and 3I).All these results supported that integration PML/RARa targets with ATO/ATRA response genes could further refine our model, and reveal that M3 subtype could be distinguished from AML at the transcriptome level.

M3-LS model identifies additional patients like M3 subtype
Based on the observations that M3-LS model can accurately predict M3 patients, we next predicted M3-like AML patients in three cohorts.That is if a patient of non-M3 subtype was predicted to be positive one, the patient was considered to form an additional subtype named M3like.In total, there were M3-like patients from 7.6% non-M3 subtypes in the training cohort, accounting for 14.04% of M1 subtype, 9.21% of M2, and 5.2% of M4 (Figure 4A).In addition, 3.61% and 12.37% AML patients in two validation cohorts were also predicted as M3-like subtype, respectively (Figure 4A).We next sought to understand the relevance of these defined M3-like patients to the functional, biological and clinical properties of M3 subtype.First, it was well known that AML is a malignant disease of myeloid progenitor cells. 34We thus applied the xCell 35 method to estimate the proportion of myeloid progenitor cells in AML patients.As a result, patients in both M3 and M3-like subtypes exhibited a much higher common myeloid progenitor (CMP) scores than the other subtypes (Figure 4B, p < 2.2e-16, Wilcoxon's rank sum tests).Moreover, several marker genes of M3 subtype exhibited significantly higher expressions in M3-like patients, such as WT1, GFI1, GATA2 and KDM1A (Figure 4C).For example, WT1, as an activated target gene of PML/RARa, was not only over-expressed in AML as described above, but also was repressed by both ATO and ATRA.WT1 has been found to be an important regulator of normal and malignant hematopoiesis, which is usually inactivated in APL patients and results in the complete loss of WT1's inhibitory function on APL tumor cells. 36We also observed higher expression of GATA2 in M3 and M3-like subtypes, which has been demonstrated as a prognosis factor in AML. 21The combination of KDM1A inhibitor and ATRA can promote the induction and differentiation of leukemia cells by ATRA. 37We found that the   expression levels of KDM1A were significantly increased in M3 and M3-like subtypes.Significantly high expression of these genes was also discovered in the validation sets (Figures S3A-S3J).
To explore the related molecular function of M3-like subtype, differentially expressed genes were first identified, and Figure 4D showed the 10 most significantly differentially expressed genes in the training and validation cohorts, respectively.Among them, WT1 and GFI1 are PML/RARa target genes.Notably, the target genes activated by PML/RARa were all up-regulated in M3-like subtype, while the target genes inhibited by PML/RARa were mostly down-regulated in M3-like subtype (Figure 4D).These findings suggested that the PML/RARa targets expression patterns of M3-like samples were highly similar to those of M3 subtype.Both carcinogenesis and immune related biological functions were further explored in AML patients by single sample gene set enrichment analysis (ssGSEA).The cancer hallmark-associated pathways were obtained from the literature 38 and the MSigDB database. 39Globally, patients in M3 and M3-like subtypes exhibited similar pathway activities across cancer hallmarks (Figures 4E and S3K).The patients in M3-like subtype were found to be enriched in six particular functions, including 'Negative regulation of cell proliferation', 'Negative regulation of cell cycle', 'Epithelial to mesenchymal transition', 'Cell migration', 'Vasculogenesis' and 'Chromosome organization', which were related to 'Insensitivity to Antigrowth Signals', 'Tissue Invasion and Metastasis', 'Sustained Angiogenesis' and 'Genome Instability and Mutation' cancer hallmarks.For the cancer hallmark-related pathways, the patients in M3 and M3-like subtypes were mostly enriched in pathways related to signal regulation, including WNT beta-catenin signaling, Notch signaling, Estrogen response early, TGF beta-signaling and Estrogen response late (Figure S3K).Thus, these findings revealed that multiple properties of M3-like patients were much more similar to M3 ones.

M3-like patients with strong GMP and distinct genomic features
A recent study has demonstrated that the cellular hierarchy composition constitutes a novel framework for understanding disease biology and advancing precision medicine in AML. 40We thus evaluated the cellular compositions of the AML patients.In total, the abundance of seven leukemic cell types was estimated by a deconvolution approach, three of which were leukemia stem and progenitor cells (LSPCs), namely Quiescent LSPCS, Primed LSPCS, and Cycling LSPCS.The other four leukemia cell types were GMP-like blasts, ProMono-like blasts, Mono-like blasts and cDC-like blasts, which were classified by a recent study. 41Based on the leukemia hierarchy composition, we revealed four distinct subtypes: Primitive (shallow hierarchy, LSPC-enriched), Mature (steep hierarchy, enriched for mature Mono-like and cDC-like blasts), GMP (dominated by GMP-like blasts) and Intermediate (balanced distribution).We found that patients in M3 and M3-like subtypes exhibited a higher proportion of GMP-like cells (Figure 5A).Moreover, the majority of patients of M3 and M3-like subtypes were classified as   GMP subtypes (Figure 5B).By analyzing the expression of GMP-like marker genes, we found that these genes were more likely to be highly expressed in both M3 and M3-like patients (Figure 5C).For instance, the expression level of IGFBP2 is high in leukemia (Figure S4A).Inhibition of endogenous IGFBP2 expression in human leukemia cells leads to increased apoptosis, decreased migration, and decreased activation of AKT and other signaling molecules. 42MPO is generally considered to be the definitive marker of myeloblasts.Targeting MPO expression or enzyme activity sensitizes AML cells to cytarabine therapy by triggering oxidative damage and persistent oxidative stress, especially in AML cells with high MPO expression 43 (Figure S4B).We also observed higher expression of CLEC11A in M3 and M3-like subtypes (Figure S3C).TCGA data showed that high expression of CLEC11A was associated with a good prognosis 44 (Figure S4C).
To better understand the genomic features of M3-like subtype, we analyzed the somatic mutations in the patients of validation cohort-1 (Figure S4D; Table S1).Generally, the mutation burden of M3 and M3-like subpopulation was relatively higher than other ones (Figure S4E).On the one hand, several genes exhibited higher mutation frequency in M3 patients (Figure S4F), such as FLT3 and ARID1B.On the other hand, distinct genomic features were found in M3-like patients, such as IDH2, RAD21 and CADM3 (Figure 5D).IDH2 mutation was not detected in M3-like subtype, and it has been shown that the vulnerability of IDH2 mutation in AML leads to sensitivity to APL-like targeted combination therapy. 33RAD21 was mutated in M3-like patients, which were more likely to be mutated in M3-like patients (Figure 5D, p = 0.013 and OR = 21.52).RAD21 is a complete subunit of the eukaryotic cohesive complex that regulates chromosome separation and DNA damage response. 45RAD21 mutation sensitized patients to treatment with the BCL2 inhibitor ABT-199, and reducing RAD21 levels sensitized AML cells to BCL2 inhibition. 46In detail, we found that FLT3, CRLF1 and CALR exhibited higher mutation frequency in M3 patients (Figure 5E), and TP53, RAD21, IDH2, and FLT3 exhibited higher frequency in M3-like patients (Figure 5E).Furthermore, we found specific CCDC60, BMPER, AMER3, AURKC, and AKNAD1 mutations only in M3-like subtype (Figure 5E).8][49] AURKC is a member of the aurora subfamily of serine/threonine protein kinases and may play a role in mitosis.It has been shown that single nucleotide polymorphisms in AURKC were associated with cancer risk in both glioblastoma and gastric cancer. 50,51These specific mutations could be used to define M3-like subtype.These results suggested that M3-like and M3 patients were highly similar in terms of GMP-like cells, and the abnormal genomic features were distinct.

M3-like patients with low immune activity and better clinical survival
Immunotherapy modulating the tumor microenvironment (TME) has a promising effect on AML, 52 but the therapy effects depend on the TME of patients.We next sought to determine whether the TMEs of M3-like patients were distinct from other subtypes.Immune scores were estimated in the training cohort by xCell, 35 and the relatively low immune scores of patients in M3 and M3-like subtypes were discovered, which were significant (Figure 6A, P < 3eÀ07 by Kruskal-Wallis Test).A similar situation was found in both validation cohorts (Figures S5A  and S5B), suggesting that M3-like patients had lower immune activity than M3 patients.Moreover, we explored the expressions of LM22 immunotherapy gene sets in the training cohort, and also found that these genes exhibited significantly lower expressions in patients of M3 and M3-like subtypes (Figure S5C).Moreover, we used ssGSEA to estimate the abundance of cell types and the activities of particular gene sets.Interestingly, the proportions of myeloid cells in M3 and M3-like patients were higher (Figure 6B), and the b-catenin signaling pathway related to immunotherapy was also enriched in most M3 patients.In human metastatic melanoma samples, there is a correlation between the activation of the b-catenin signaling pathway in tumors and the absence of T cell gene expression signature, which leads to the mechanism of immunotherapy resistance. 53In contrast, M3 and M3-like patients were less enriched for other immune-related gene sets (Figure 6B).Moreover, in validation cohort-2, M3 and M3-like patients also had lower enrichment of immune-related gene sets, except for myeloid cell-related gene sets (Figure S5D).
Finally, the clinical correlations were explored, we found that patients with different subtypes exhibited significantly distinct survival outcomes, in line with the observed associations with Cancer and Acute Leukemia Group B (CALGB) cytogenetics risk category, and patients in M3 and M3-like subtypes had better clinical survival (Figure 6C, p = 0.0004, log rank test).Moreover, there were higher proportions of patients with favorable outcomes in M3 and M3-like subtypes (Figure 6D, p < 2.2e-16, Fisher's exact test).So, M3-like patients were characterized by low infiltration of immune cells and better clinical survival outcome.

DISCUSSION
In this study, we developed a novel computational model to discover M3-like subtype of AML based on expression features of PML/RARa targets.Our analysis found that the expression of PML/RARa targets was frequently perturbed across AMLs and helped identify M3 subtype.Previous studies have shown that some AML patients with IDH2 mutations respond well to ATRA and ATO combination therapy, although they may not have the PML/RARa fusion protein. 33Therefore, we hypothesized that non-M3 patients with high expression of PML/RARa up-regulated target genes and low expression of down-regulated target genes were likely to be M3 subtype.Our computational model can not only distinguish patients of M3 subtype, but also can further predict a set of samples with similar expression patterns to M3 subtype.
Notably, several results suggest that these M3-like patients are more consistent with M3 subtype, such as the expression patterns of several important marker genes of M3 subtype, the proportion of myeloid progenitor cells, as well as deconvolution of AML constituent cell populations.Furthermore, we found that M3-like patients exhibit some molecular features that differ from other non-M3-like patients, including genomic mutations and molecular immune signatures.Benefiting from the high efficiency of ATRA and ATO combined therapy, the survival prognosis of M3 patients is generally superior to that of other subtypes. 9Interestingly, we found that the clinical prognosis of M3-like samples was similar to that of M3 samples and significantly better than that of other samples.Moreover, an unexpected finding of a GSE10358, GSE61804, GSE68833, GSE12662, GSE12417, GSE37642.b http://v15.proteinatlas.org/about/download. c GSE83449, GSE9476, GSE12417, GSE34860, GSE37642.
our study was that both M3 subtype and M3-like subtype tend to have low immune characteristics, which is also a possibility that they are not suitable for immunotherapy, further indicating that they might be suitable for targeted therapy.The most widely accepted treatment regimen of M3 subtype is the classic targeted combination therapy of ATO/ATRA, and their cure rate is up to 95%. 6,7Therefore, expanding this treatment plan to more types of AML can enable more leukemia patients to be treated effectively.Our model performance was improved by further requiring the PML/RARa targets to respond to ATRA/ATO or to be differentially expressed in M3 subtype.In addition, we also found that treatment did not significantly affect the expression of PML/RARa targets and the efficacy of the model.The Jaccard-coefficient of differentially expressed genes between treatment and diagnostic groups and PML/RARa targets was very low, only 0.0188.The AUC of the reconstructed model only based on diagnostic samples was 0.96.However, there are still some challenges in the optimization process.ATRA-treated cell lines collected by us were those of M3 subtype with higher consistency, while the ATO-treated cell lines were derived from multiple human tissues and were heterogeneous.Hence, we used different methods to extract ATRA and ATO target genes.If data on ATO/ATRA medication were consistent in the background of M3 subtype, our model could be further improved.Additionally, we tried to find M3-like cells in existing cell lines for testing the efficacy of ATRA and ATO.However, we found no cell lines with high M3-LS except for NB4 (M3 type) (Table S3).In future studies, we will try to construct M3-like primary cells to validate the model.
A large number of targeted therapies for AML are currently being developed, and great progress has been made in targeted therapies for M3 patients.We believed that the initiative of identifying patients similar to M3 subtype in our study may help to find patients who would benefit from ATO/ATRA treatment and deepen our understanding of AML pathogenesis.

Limitations of the study
There are still several challenges in the optimization process.Our collections of ATRA-treated cell lines were those of M3 subtype with higher consistency, while the ATO-treated cell lines were derived from multiple human tissues and were heterogeneous.Hence, we used different methods to extract ATRA and ATO target genes.If data on ATO/ATRA medication were consistent in the background of M3 subtype, our model could be further improved.

STAR+METHODS
Detailed methods are provided in the online version of this paper and include the following:

Figure 1 .
Figure 1.PML/RARa targets are perturbed across AMLs and help identify M3 subtype (A) The change of gene expression after PML/RARa gene knockout.The labels show the top 10 genes that are significantly upregulated or downregulated.(B)Enrichment of PML/RARa target genes and differential genes between AML patients and healthy control samples (FDR <0.05, FC > 1.5).The height of the bar graph is the proportion of differentially expressed genes in the targets, and the line chart is the -log10(p-value) of the hypergeometric test between the differential genes and the target genes.(C) Top: PML/RARa effects on transcriptional activities of the directly activated gene WT1.This diagram illustrates that ChIP-seq abundance in WT1 after PML/ RARa knockout using small interfering RNA (siRNA) targeting the fusion site of PML/RARa.The panel shows the genome browser tracks of PML/RARa binding.Bottom: The abundance of RNA-seq was compared between two control samples and two samples using siRNA knockout of PML/RARa.The chip seq data were obtained from a previous study.5(D) Functional enrichment analysis of differentially expressed PML/RARa target genes in each AML subtype.E. t-SNE analysis of PML/RARa target genes transcriptomic data for 519 AML samples in the training cohort.Each point represents a sample visualized in a two-dimensional projection.Samples of each subtype are displayed using a different color.Particularly, M3 subtype samples represented by red dots are spontaneously clustered together.

Figure 2 .
Figure 2. M3-LS model accurately predicts M3 subtype in AML (A) Random forest, XGBoost, and M3-LS model were used to predict M3 samples, and Receiver operating characteristic curve (ROC) analysis was used to evaluate the prediction model.(B) The proportion of M3-like samples predicted by the optimized model in each subtype, amaranth represents the proportion of samples predicted to be M3-like subtype, and yellow represents the proportion of samples not predicted to be M3-like subtype.(C) Model scores were compared for each AML subtype.Boxes and violin plots showing median, 25th and 75th percentiles.Purple box and violin plots represent model scores for all AML samples except M3 subtype.Wilcoxon Rank-Sum test was used for statistical calculation.Validation cohort-1 and 2, (D and G) Random forest, XGBoost machine learning models and M3-like scoring index were used to predict M3 samples.(E and H) The proportion of M3-like samples predicted by the optimized model in each subtype.(F and I) Model scores were compared for each AML subtype.

Figure 3 .
Figure 3. Performance of M3-LS model was improved by integrating ATO/ATRA response genes (A) Venn plot of model optimization, including leading edge genes (LEGs) of ATO, differential genes robust rank aggregation results of ATRA, PML/RARa target genes, and differential expression genes of M3 and healthy controls in the training cohort.(B) Random forest, XGBoost machine learning model and optimized M3-LS index were used to predict M3 samples, and ROC analysis was used to evaluate the prediction model in the training cohort.(C) Comparison of the scores of the optimized models for each subtype.(D) The model was validated using ROC analysis.Use the model to predict several randomly selected samples, the line graph represents the size of the AUC.(E) Probability density distribution plot of AUC.(F) Random forest, XGBoost machine learning model and optimized M3-LS index were used to predict M3 samples, and ROC analysis was used to evaluate the prediction model in the validation cohort-1.(G) Comparison of model scores for each subtype in the validation cohort-1.(H) Random forest, XGBoost machine learning model and optimized M3-LS index were used to predict M3 samples, and ROC analysis was used to evaluate the prediction model in the validation cohort-2.(I) Comparison of model scores for each subtype in the validation cohort-2.Statistics were calculated using Wilcoxon Rank-Sum test.

Figure 4 .
Figure 4. M3-LS model identifies additional patients like M3 subtype (A) The proportion of M3-like samples predicted by the optimized model in each subtype in the training and validation cohorts, respectively.Amaranth represents the proportion of samples predicted to be M3-like subtype, and yellow represents the proportion of samples not predicted to be M3-like subtype.(B) Violin plot of the proportion of common myeloid progenitor (CMP) of each subtype identified in the training cohort.Boxes plots show median, 25th and 75th percentiles of CMP for each subtype.p values are calculated using Kruskal-Wallis Test.(C) The expression levels of WT1, GFI1, GATA2, and KDM1A of each subtype were compared AML cases with predicted as M3-like versus M3 subtype and other samples in the training cohort.p value was estimated using Kruskal-Wallis Test.(D) The differential expression of PML/RARa target genes in M3-like and other samples in the training and validation cohorts.The heatmap shows the fold change (FC) values of differential genes in M3-like samples relative to other samples, and the genes in red font are characteristic genes of M3 subtype.(E) Cancer Hallmark pathway enrichment of M3 subtype, M3-like subtype and other samples.The heatmap shows the results of single sample gene set enrichment analysis (ssGSEA) of each subtype sample in each Cancer Hallmark pathway (Statistical significance was assessed by Wilcoxon Rank-Sum test, *p < 0.05, **p < 0.01, ***p < 0.001, ****p < 0.0001).Data are represented as mean.

Figure 5 .
Figure 5. M3-like patients with strong GMP and distinct genomic features (A) Relative abundance of each leukemic cell type per patient.Each bar represents a patient, and the distribution of colors on each bar represents the distribution of the leukemia cell populations within their leukemic hierarchy.(B) Hierarchical classification of leukemia cells for each subtype in the training cohort.(C) In the training cohort, the expression of the GMP -like marker genes of M3 subtype, M3-like subtype and other samples.(D) Mutation frequency of some genes in M3-like subtype (left) and other sample (right).Statistical significance was assessed by Fisher's test.(E) Top 10 genes with mutation frequency in M3 subtype and M3-like subtype.

Figure 6 .
Figure 6.M3-like patients with low immune activity and better clinical survival (A) Immune scores for each subtype were calculated using Xcell.Boxplots show median, 25th and 75th percentiles of immunity scores for each subtype.p values are calculated using Kruskal-Wallis Test.(B) In the training cohort, enrichment of various immune gene cohorts and myeloid gene cohorts for M3 subtype, M3-like subtype and other samples.The heatmap shows the results of ssGSEA of each subtype sample in each gene cohort.(C) Kaplan-Meier survival analysis of AML cases predicted as M3-like versus M3 subtype and other samples in the validation cohort-1.p-values were estimated using the log rank test.(D) Percentage of favorable patients for each subtype in the validation cohort-1 (Statistical significance was assessed by Fisher's test, *p < 0.05, **p < 0.01, ***p < 0.001, ****p < 0.0001).Data are represented as mean.

Table 1 .
Characteristics of AML patients

Table 2 .
Cell lines treated with ATO or ATRA