Mediation analysis reveals common mechanisms of RUNX1 point mutations and RUNX1/RUNX1T1 fusions influencing survival of patients with acute myeloid leukemia

Alterations of RUNX1 in acute myeloid leukemia (AML) are associated with either a more favorable outcome in the case of the RUNX1/RUNX1T1 fusion or unfavorable prognosis in the case of point mutations. In this project we aimed to identify genes responsible for the observed differences in outcome that are common to both RUNX1 alterations. Analyzing four AML gene expression data sets (n = 1514), a total of 80 patients with RUNX1/RUNX1T1 and 156 patients with point mutations in RUNX1 were compared. Using the statistical tool of mediation analysis we identified the genes CD109, HOPX, and KIAA0125 as candidates for mediator genes. In an analysis of an independent validation cohort, KIAA0125 again showed a significant influence with respect to the impact of the RUNX1/RUNX1T1 fusion. While there were no significant results for the other two genes in this smaller validation cohort, the observed relations linked with mediation effects (i.e., those between alterations, gene expression and survival) were almost without exception as strong as in the main analysis. Our analysis demonstrates that mediation analysis is a powerful tool in the identification of regulative networks in AML subgroups and could be further used to characterize the influence of genetic alterations.


Mediation analysis for the t(8;21)++ vs. RUNX1++ comparison of genes used in LSC17
The LSC17 is a prognostic score for AML patients that features 17 genes 3 . It seems interesting to study, whether some of these genes show indications for mediation activity with respect to the t(8;21)++ vs. RUNX1++ comparison. In the main analysis presented in the paper, we were not able to identify any candidates for mediator genes in this direct comparison of RUNX1 point mutations and RUNX1/RUNX1T1 fusions. As also stated in the paper, the latter can likely be explained by the limited sample size available for that comparison.
In this supplementary section, we present the results of an additional mediation analysis in which we only considered genes that are featured in LSC17. This restriction to only a few promising genes greatly alleviates the multiple testing problem in comparison to the main analysis. Due to this restriction, only a small number of p values have to be adjusted for multiple testing instead ofas in the main analysis -hundreds of p values. Moreover, given the prognostic power of LSC17, it seems much more likely that there are mediator genes among the genes considered in this prognostic score than among genes for which there is no known (strong) relation to the outcome of AML patients. Given the small number of considered genes and the small sample size in this analysis, we did not use a part of the data for pre-selecting promising genes here. Instead, we combined the data sets AMLCG Cohort 1 and AMLCG Cohort 2 with TCGA to a single data set in order to increase the statistical power of the analysis. Again, ComBat was used to remove batch effects between these two data sets. This combined data set featured 16 of the 17 genes considered in LSC17. Using the combined data, each of these 16 genes was tested for mediation activity with respect to the t(8;21)++ vs. RUNX1++ comparison. For this task, the testing framework by Lange and Hansen was employed in exactly the same way as described in section 'Definition of mediator effects and testing procedure' of the paper. The p values from these tests were again adjusted for multiple comparisons using the Benjamini-Hochberg procedure.
The results are presented in Supplementary Table S3. Three of the 16 genes show a statistically significant mediation activity in this analysis: GPR56, KIAA0125, and NGFRAP1.
Also in the main analysis presented in the paper, KIAA0125 was among the three genes identified as candidates for mediation activity with respect to t(8;21)++ vs. RUNX1++. The other two of the three genes discussed in the paper, CD109 and HOPX, were not considered in the current analysis, because these two genes are not among the 16 genes from LSC17 studied here.
Summarizing, by considering a more robust analysis flow in this supplementary subsection, we were able to identify two further genes that are promising with respect to mediation activity for the comparison t(8;21)++ vs. RUNX1++. Note, however, that we used the same data sets, AMLCG Cohort 1 and AMLCG Cohort 2 with TCGA, for this analysis as for the main analysis. Thus, there is a higher danger that the results obtained from this analysis are over-optimistic than there would be, if we would have had independent data available for this second analysis.

List of genes found to be differentially expressed using limma for the comparison t(8;21)++ vs. RUNX1++ together with p values after Benjamini-Hochberg adjustment obtained in the validation analysis
Variable p value from validation analysis (using AMLCG