Integration of Transcriptomic Features to Improve Prognosis Prediction of Pediatric Acute Myeloid Leukemia With KMT2A Rearrangement

Lysine methyltransferase 2A-rearranged acute myeloid leukemia (KMT2A-r AML) is a special entity in the 2022 World Health Organization classification of myeloid neoplasms, characterized by high relapse rate and adverse outcomes. Current risk stratification was established on the treatment response and translocation partner of KMT2A. To study the transcriptomic feature and refine the current stratification of pediatric KMT2A-r AML, we analyzed clinical and RNA sequencing data of 351 patients. By implementing least absolute shrinkage and selection operator algorithm, we identified 7 genes (KIAA1522, SKAP2, EGFL7, GAB2, HEBP1, FAM174B, and STARD8) of which the expression levels were strongly associated with outcomes. We then developed a transcriptome-based score, dividing patients into 2 groups with distinct gene expression patterns and prognosis, which was further validated in an independent cohort and outperformed the LSC17 score. We also found cell cycle, oxidative phosphorylation, and metabolism pathways were upregulated in patients with inferior outcomes. By integrating clinical characteristics, we proposed a simple-to-use prognostic scoring system with excellent discriminability, which allowed us to distinguish allogeneic hematopoietic stem cell transplantation candidates more precisely. In conclusion, pediatric KMT2A-r AML is heterogenous on transcriptomic level and the newly proposed scoring system combining clinical characteristics and transcriptomic features can be instructive in clinical routines.

T ranslocation of 11q23 involving lysine methyltransferase 2A (KMT2A, also known as MLL) gene is one of the most frequent leukemia-defining abnormalities, which affects both lymphoid and myeloid lineages.It takes up nearly 10% of all pediatric leukemias and is the commonest type of infant leukemia. 1Clinically, KMT2A-rearranged (KMT2A-r) leukemias are characterized by hyperleukocytosis, hepatosplenomegaly, resistance to conventional therapy and with high relapse rate and dismal outcomes.To date, at least 135 fusion partners of KMT2A have been described, with MLLT3, MLLT4, MLLT10, and MLLT1 being the commonest ones in KMT2A-r acute myeloid leukemia (AML), and they were also clinically and biologically heterogenous. 2,3Despite great advances in molecular biology in recent years, the prognosis of KMT2A-r AML did not improve much.
Risk stratification of AML plays a critical role in clinical decision-making.Currently, risk groups were mainly determined by different translocation partners of KMT2A gene.In the 2022 ELN guidelines, all adult KMT2A-r AML except for KMT2A::MLLT3 were regarded as high risk with very adverse outcomes. 4However, this did not seem to work well in pediatric AML.Yuen et al 5 demonstrated pediatric AML with KMT2A::MLLT1 had higher overall survival (OS) and lower relapse rates than KMT2A::MLLT3.Balgobind et al 6 identified AML with KMT2A::MLLT11 had excellent outcome and failed to confirm favorable outcome of KMT2A::MLLT3 in pediatric AML.These phenomena indicated greater heterogeneity in pediatric cohorts, suggesting risk stratification solely based on translocation partner is far from adequate.Meanwhile, comparing to other subtypes of AML, traditional predictors such as cytogenetic abnormalities and certain gene mutations provide less informative insights into the prognosis of KMT2A-r AML. 7 In 2013, Groschel et al 8 identified that high expression level of EVI1 gene was associated with poor prognosis of KMT2A-r AML, which was further validated in patients who underwent allogeneic hematopoietic stem cell transplantation (HSCT). 9,102][13] These studies suggested transcriptomic feature might serve as a promising biomarker to distinguish patients with different risk levels.Aiming to refine current risk stratification and provide evidence for further research, we analyzed the clinical and RNA sequencing data of pediatric KMT2A-r AML and developed a prognostic Genes that were expressed (RPKM >1) in >30% of samples were further analyzed.The batch effect generated by various library preparation strategies was corrected using the ComBat function in the sva R package.The differentially expressed genes were identified by the limma R package, with an adjusted P value of <0.05 and a log2-fold change of >2.The R package clusterProfiler was then used to perform gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses.Over-representation analysis for GO-and KEGG-related terms was assessed with the Fisher's exact test and corrected for multiple testing by the Benjamini-Hochberg method.Only terms with an adjusted P value of <0.01 were considered for GO analyses and P value of <0.05 for KEGG analyses.

Statistical analysis
Continuous variables were presented by median and interquartile range (IQR) and compared by Mann-Whitney U test.Categorical variables were presented by percentiles and compared by Fisher's exact test.Univariable Cox regression was applied to identify the potential genes that are associated with EFS.The least absolute shrinkage and selection operator (LASSO) regression was performed by glmnet package in R, to construct a prognostic model based on gene expression levels that were strongly associated with EFS.LASSO regression is a regularization technique that has the effect of shrinking the regression coefficients toward zero, which result in some coefficients being exactly equal to zero.The optimal shrinkage parameter λ, which controls the number of included genes, was determined by 10-fold cross validation of the partial likelihood deviance.Each patient had a risk score calculated according to the model.We used the software X-tile to determine the cutoff value of risk score.X-tile software is a bioinformatics tool that can classify continuous variables into categorical variables based on outcome-based cut-point optimization. 18Multivariable Cox regression was applied to investigate the independent prognostic value of factors.The OS and EFS probabilities were estimated using the Kaplan-Meier method and compared by log-rank test using survival R package.The CIR was estimated by adjusting for competing risks and was compared by Gray's test using tidycmprsk R package.Model performance was assessed by area under time-dependent receiver operating characteristic (AUROC) curve and compared by the method of Chiang 19 using timeROC R package.All P values were 2-sided with a significance level of 0.05.All statistical analyses were performed by R software 4.2.2(The CRAN project, www.r-project.org).

The pKMT2A7 score predicts outcome of patients in external validation set
To validate the prognostic value of pKMT2A7 score in an external cohort, the patients in the validation set were also separated into G1 (n = 66) and G2 (n = 56) groups.G1 group had better outcomes than G2 group with higher 5-year OS (64.6% ± 6.0% versus 46.7% ± 6.9%, HR, 1.85 [95% CI, 1.08-3.19],P = 0.024) and EFS (43.5% ± 6.2% versus 22.4% ± 5.7%, HR, 1.73 [95% CI, 1.11-2.69],P = 0.014) rates (Figure 2E,F).Similarly, G1 group of validation set has more patients under 10 years old (74.2% versus 57.1%; P = 0.056) and with KMT2A::MLLT3 fusion gene (34.8%versus 26.8%; P = 0.433), however, without statistical significance due to limited sample size.The MRD data were not available for validation set.G1 and G2 had exhibited disparate gene expression patterns (Figure 3A).To investigate the biological differences, we analyzed the differentially expressed genes of the two groups.A total of 1127 upregulated and 1219 downregulated genes were identified.Then, we implemented GO and KEGG pathway enrichment analyses.According to the findings, the top enriched GO terms for upregulated genes in G2 were mostly related to metabolic process and aerobic respiration and the most significant pathways included cell cycle, oxidative phosphorylation (OXPHOS), and metabolism.As for G1 group, the top enriched GO terms for upregulated genes were mainly associated with mRNA processing, while the top pathways included MAPK, Ras, chemokine and mTOR signaling pathways (Figure 3B,C).

Establishment of a new prognostic system incorporating pKMT2A7 score and other clinical characteristics
To investigate the independent prognostic value of the pKMT2A7 score, we included it into uni-and multivariable Cox analyses along with other potential prognostic factors.
Based on prior studies, we chose 10 years old as cutoff value for age at diagnosis. 22,23It showed that high pKMT2A7 score was an independent risk factor for both OS and EFS (HR OS , 2.  2).We further conducted subgroup analyses, and the pKMT2A7 score remained a strong predictor in all subgroups including 10 years older or younger, MLLT3 and other fusion partners (Suppl.Figure S3A,D).
To further refine the risk stratification, we incorporated independent risk factors for EFS into the prognostic model except for HSCT because it was a treatment rather than a clinical feature.Integer weights were assigned according to HRs in multivariable Cox regression of EFS.Factors with HRs of 1.0-2.0 were converted into a weight of 1.0, and HRs of >2.0 were converted into a weight of 2.0.The revised risk score was formulated as such: age at diagnosis (≥10 y old) × 1 + translocation partner (other than MLLT3) × 1 + pKMT2A7 score (G2) × 2 (Table 3).Using the new prognostic system, patients were stratified into 3 risk groups, low risk (0 score), intermediate risk (1-2 scores), and high risk (3-4 scores) (Figure 5A).The 5-year OS rates of low-risk, intermediate-risk, and high-risk groups were 79.8% ± 5.8%, 60.7% ± 4.0%, and 35.0%± 4.6%, respectively (high risk versus low risk: HR, 4. We further evaluated how the prognostic system perform without pKMT2A7 score.By the same strategy, we assigned integer weights to the independent risk factors according to HRs in multivariable Cox regression of EFS.Results of multivariable Cox regression excluding pKMT2A7 score were summarized in Suppl.Table S2.Age at diagnosis (≥10 y old) and translocation partners (other than MLLT3) were assigned to 2 scores and 1 score, respectively.We defined patients with 0 score as low risk, 1-2 scores as intermediate risk and 3 scores as high risk.The newly proposed risk stratification outperformed the one that did not contain pKMT2A7 score with better discriminative ability especially in identifying low-risk patients (Figure 5B-D) and higher AUROC (Figure 5E,F).

DISCUSSION
Herein, we analyzed clinical and RNA-seq data of 351 pediatric KMT2A-r AML patients, and our findings suggested that transcriptomic features (pKMT2A7 score) can be informative of prognosis.Meanwhile, we identified some biological pathways that may be the underlying mechanisms causing the discrepancies of prognosis and provided foundation for further pathophysiology research.Finally, by incorporating clinical characteristics into pKMT2A7 score, we developed a new prognostic system and refined risk stratification for pediatric KMT2A-r AML, which outperformed risk stratification based on age and translocation partners and could be instructive in clinical decision-making.
In our study, the prognosis of pediatric KMT2A-r AML remained poor with 5-year EFS around 33% despite that over 95% patients were originally assigned as standard risk.This reflected the risk allocation of pediatric KMT2A-r AML needed urgent refinement.Recent years have seen many efforts made in developing transcriptome-based risk prediction, but they rarely focused on KMT2A-r AML.LSC17 score is a widely accepted model for risk prediction of AML, which was validated in pediatric cohorts as well.We proved that pKMT2A7 score had better discriminability than the LSC17 score, at least in the KMT2A-r AML.This could be attributed to several reasons: (1) The LSC17 score was originally developed on adult AML data, the discrepancies of transcriptome between pediatric and adult AML might result in reduction of predictive accuracy 24 ; (2) Pediatric KMT2A-r AML is a highly heterogenous group, with some patients having favorable outcomes similar to core binding factor AML, whereas some carrying extremely poor prognosis.Consequently, KMT2A rearrangement often failed to be identified as a risk factor in the analysis including all kinds of pediatric AML. 21A model built on the entire AML cohort may lose some accuracy in discriminating patients in one specific subtype.Therefore, we recommend pediatric KMT2A-r AML be treated as a special entity with its own risk stratification.
It is very interesting that G1 group, the one with relatively better outcome, also had more patients under 10 years old and with MLLT3 fusion partner, which were proved to be protective factors in multivariable analysis as well.This phenomenon raised the question that did pKMT2A7 score really confer prognostic information or it was confounded by other factors.
However, pKMT2A7 score still remained the strongest predictor for survival when including all potential prognostic factors in multivariable analysis and subgroup analyses.Meanwhile, the fact that discriminability of the model reduced to a great deal when eliminating pKMT2A7 score further underscored its prognostic value.So, we think pKMT2A7 score reflected the heterogeneity within certain translocation partners.Still, larger cohorts and more integrated transcriptomic analyses are needed to address this problem.
Recently, a paper reported that apart from translocation partners, positivity of flow cytometry-based MRD at the end of induction 2 was also a strong predictor in pediatric KMT2A-r AML and suggested it being included into risk stratification. 22ut our data showed that though there were only 5% patients who did not achieve MRD negativity after two courses, most patients still relapsed even with rapid MRD clearance, indicating risk stratification based on MRD at the end of course 2 might underestimate the risk for some patients.The biggest strength of our prognostic model is that we determine risk groups at very early stage without knowing treatment response, which can facilitate protocol design, for example, intensifying induction 2 or introducing more targeted therapy, to achieve a more profound remission and improve the prognosis of patients with higher risk.
HSCT is one of the major consolidation therapies for pediatric AML with adverse factors.Most current evidence supported that HSCT could improve the outcomes of pediatric KMT2A-r AML. 25,26Nonetheless, our findings indicated that only individuals categorized as high-and intermediate risk according to our prognostic system could potentially benefit from HSCT in CR1.Given the elevated therapy-related mortality associated with HSCT and its impact on long-term quality of life, the candidates and timepoint to receive HSCT should be carefully considered.Therefore, further refinement to the prognostic system is warranted to discover more heterogeneity and enhance the accuracy of HSCT candidate selection.Besides, our study also identified a small group (14.5%, 51/351) of patients with low risk, who shared similar outcomes with conventional low-risk AML such as AML with core binding factor fusion genes.For patients with low risk, HSCT at CR1 did not confer survival benefit, thus the improvement of prognosis relied on novel treatment approaches in addition to conventional chemotherapy.
Menin is one of the most important parts of the KMT2A complex and is critical to develop and maintain leukemogenesis through epigenetic modulation. 27,28The first menin inhibitor was developed in 2012 with the ability to down-regulate the target genes involving in oncogenesis and induce both apoptosis and differentiation of leukemia cells harboring KMT2A translocations.Subsequently, multiple preclinical studies demonstrated potent anti-tumor activity of menin inhibitor in vivo, paving  the way for the ongoing clinical trial. 29,30Besides, menin inhibitor in combination with BCL2 inhibitors also exerted synergistic lethality in cell lines. 31These potent agents are promising to further enhance treatment effectiveness and reduce toxicity, especially for those low-risk and intermediate-risk patients in our prognostic model.In addition, we identified some biological pathways that might be associated inferior outcomes.Both dysregulation of cell cycle and metabolic abnormalities had abundant laboratory evidence showing their relevance with the pathogenesis of AML. 32,33Preclinical studies showed agents blocking cell cycle progression pathways was effective in leukemia cell lines with KMT2A translocations. 34Blocking these  There are some limitations of our study.First, our model was developed and validated based on limited cohorts, larger sample size is needed to validate our conclusions.Second, there were randomizations in the protocols used in this study, bortezomib in AAML1031 and gemtuzumab-ozogamicin in AAML0531, which could potentially bias the results.Third, due to the extensive range of translocation partners of KMT2A, we aggregated all others except for MLLT3, which could potentially under-or overestimate the risk of some translocation partners.

Figure 1 .
Figure 1.The workflow chart showing the establishmentof the integrated prognosis system.

Figure 2 .
Figure 2. (A) Heatmap of the expression levels of the 7 genes of pKMT2A7 score across patients in training set.(B) The risk score of each patient in the training set.(C,D) Kaplan-Meier curves for OS and EFS rates of G1 and G2 in training set.(E,F) Kaplan-Meier curves for OS and EFS rates of G1 and G2 in validation set.EFS = event-free survival; OS = overall survival.

Figure 3 .
Figure 3. (A) Heatmap of the expression levels of differentially expressed genes between G1 and G2.(B,C) GO enrichment and KEGG pathways analysis of upregulated genes of G1 and G2.ATP = adenosine triphosphate; GO = gene ontology; KEGG = Kyoto Encyclopedia of Genes and Genomes; MAPK = mitogen-activated protein kinase.

Figure 4 .
Figure 4. (A,B) Kaplan-Meier curves for OS and EFS rates of high and low LSC17 score groups (solid lines) and pKMT2A7 groups (dashed lines) in training set.(C,D) Time-dependent ROC curves for OS and EFS comparing pKMT2A7 score and LSC17 score in training set.EFS = event-free survival; OS = overall survival; ROC = receiver operating characteristic.

Figure 5 .
Figure 5. (A) The chart is presented to show the cumulative score of the prognostic system.In each cell of the chart, the score for a patient is calculated with the values of each predictor.Then, the cells of the chart are colored in accordance with the risk status: 0, low risk, 1-2, intermediate risk, and 3-4, high risk.(B,C) Kaplan-Meier curves for OS and EFS rates of risk groups stratified by the proposed prognostic system including (solid lines) and excluding pKMT2A7 score (dashed lines).(D) Different risk stratification including and excluding pKMT2A7 score.(E,F) Time-dependent ROC curves for OS and EFS comparing prognostic models including and excluding pKMT2A7 score.EFS = event-free survival; OS = overall survival; ROC = receiver operating characteristic.

Table 2
Uni-and multivariable Cox analysis of variables impacting OS and EFS

Table 3
Weighted score of each factor in the prognostic system