Subunit-specific analysis of cohesin-mutant myeloid malignancies reveals distinct ontogeny and outcomes

Mutations in the cohesin complex components (STAG2, RAD21, SMC1A, SMC3, and PDS5B) are recurrent genetic drivers in myelodysplastic neoplasm (MDS) and acute myeloid leukemia (AML). Whether the different cohesin subunit mutations share clinical characteristics and prognostic significance is not known. We analyzed 790 cohesin-mutant patients from the Dana-Farber Cancer Institute (DFCI) and the Munich Leukemia Laboratory (MLL), 390 of which had available outcome data, and identified subunit-specific clinical, prognostic, and genetic characteristics suggestive of distinct ontogenies. We found that STAG2 mutations are acquired at MDS stage and are associated with secondary AML, adverse prognosis, and co-occurrence of secondary AML-type mutations. In contrast, mutations in RAD21, SMC1A and SMC3 share features with de novo AML with better prognosis, and co-occurrence with de novo AML-type lesions. The findings show the heterogeneous nature of cohesin complex mutations, and inform clinical and prognostic classification, as well as distinct biology of the cohesin complex.

Cohesin mutations have been understood as genetic drivers in myeloid malignancies for almost a decade, however, detailed examination of the cohesin gene-specific disease characteristics and the prognostic impact are lacking.STAG2 mutations have previously been grouped with other secondary AML ontogenydefining mutations [20,21], and the 2022 European Leukemia Network (ELN) guidelines classify them in the subgroup of adverse risk AML [22].However, it is unclear whether a secondary AML ontogeny attribution and adverse prognostic impact can also be assigned to the less frequent cohesin complex mutations in SMC1A, SMC3, RAD21, and PDS5B [21].Furthermore, an independent prognostic value for any cohesin gene mutation has previously not been established [23,24].
In the largest cohort of cohesin-mutated myeloid neoplasms reported to date, we characterized the incidence, clinical presentation, genomic landscape, and clinical outcomes of cohesin subunit mutations, and identified subunit-specific effects with disease ontogeny and prognostic implications, which informs distinct biology of these important genetic drivers.

Patient cohort
We analyzed 2 independent cohorts of patients as described below (Supplementary Fig. 1).Cohort 1 ("DFCI cohort") included 5,191 patients seen at the Dana-Farber Cancer Institute (DFCI) with a confirmed hematologic malignancy as defined by the 2016 World Health Organization (WHO) classification [25] from August 2014 to November 2021 based on morphology and cytogenetic findings.For these cases, the WHO diagnoses were retrospectively translated to WHO 2022 [2] classification based on published diagnostic criteria.Classification of all cohesin-mutant (MT) cases underwent independent hematopathology review.The subset of 759 patients with any detectable variant (defined by previously established allele frequency thresholds [26]) in a cohesin complex gene regardless of disease entity were extracted, and 311 (40.1%) patients were found to have a pathogenic cohesin mutation (as defined in the "Mutational profiling" section below).From these, 256 cohesin-MT AML, MDS, or MDS/MPN patients were compared to 3,109 wild type (WT) cases.Patients were compared in terms of demographics, clinical characteristics, and outcomes.
Cohort 2 ("MLL cohort") included a total of 479 patients treated across Germany who underwent diagnostic workup of a suspected or confirmed myeloid malignancy at the Munich Leukemia Laboratory (MLL) between 2005 and 2022 and were found to have 1 or more pathogenic mutations in STAG2, SMC1A, SMC3, or RAD21 (PDS5B mutation status was not assessed).Diagnoses from peripheral blood and bone marrow were made based on cytomorphology, cytogenetics, and molecular genetics as previously described [27][28][29] in accordance with the 2016 WHO classification and reviewed by 2 board-certified hematopathologists.All cases were classified into specific subgroups according to the WHO 2022 classification.All 479 cases were used for the demographics and disease type analyses.Only 134/479 MLL cases (28%) had sufficient clinical annotation with available date of diagnosis and follow-up for outcome analysis.In addition, a selected cohort of 1378 cases (838 MDS and 540 AML) without evidence of a cohesin mutation and with available follow-up data were selected from the MLL dataset based on comprehensive sequencing data availability and used to compare outcomes.All patients gave their consent for genetic analyses and the use of laboratory results for research purposes.The study was approved by the DFCI Institutional Review Board (IRB) and the MLL IRB.

Mutational profiling
For the DFCI cohort, cytogenetic data were extracted from clinical reports of karyotype and fluorescent in-situ hybridization (FISH) generated by the DFCI clinical cytogenetics laboratory.Molecular data were obtained from reports of clinical next-generation sequencing (NGS) performed using the DFCI Rapid Heme Panel (RHP) at diagnosis and relapse, as previously reported [26].Genes included in the RHP were selected based on their known or suspected involvement in the pathogenesis of myeloid or lymphoid cancers, or inherited or acquired bone marrow failure syndromes, and are listed in the Supplementary Table 1.One of two different versions of RHP was used for sequencing analysis of study cases: samples acquired between August 2014-October 2019 were analyzed using RHP version 2, which was based on a custom amplicon-based approach.A minimum of 10 variant reads or 5-9 variant reads at >33% allelic frequency were required for mutation calling.RHP version 3 was used for samples acquired after November 2019, and the revised platform used unique molecular identifiers (UMIs) for error-suppression to allow reliable variant calls at a variant allelic fraction (VAF) of 0.01 or greater requiring a minimum of 3 mutant reads.Median mutation coverage was 568x (95% confidence 100x-2036x).All truncation, frameshift, or splice site mutations in STAG2, RAD21, SMC1A, SMC3, and PDS5B were considered pathogenic, and all missense mutations were manually reviewed for nongermline allelic frequency and damaging PolyPhen score ( > 0.85), or for evidence in OncoKB [30] or COSMIC [31] for significant mutations in cancer [21].Mutational and cytogenetic analyses for the MLL cohort were performed as previously reported and based on whole genome sequencing (WGS), and validation used targeted deep sequencing [32,33].A total of 763 samples were assayed by whole genome sequencing (WGS) and analyzed as described in previous reports [7,34,35].There were 1157 cases assayed by targeted sequencing, which were analyzed during routine diagnostic workup or for research purposes [7].WGS data confirmed all mutations detected by targeted NGS panels and was further consulted for completing the mutational analysis of the 73 genes.

Transcriptomic analysis of BeatAML
Raw counts from patients with reported mutations [36] in STAG2, RAD21, SMC3 or SMC1A were normalized using DESeq2.Differentially expressed genes were called between any cohesin subunits using DESeq2 with an FDR < 5%.All these genes were used for unsupervised k-means clustering and annotated cohesin mutations were superimposed to this clustering.Genes from each k-means cluster were queried against Metascape [37], and gene sets with FDR < 0.01 were plotted.

Statistical analysis
All statistical analyses were conducted using R v4.2.1.Statistical significance was considered using a significance level α of 0.05.Normality was assessed using the Shapiro-Wilk test.If normality distribution was met, a two-sided unpaired t-test was applied to continuous variables between 2 groups unless stated otherwise.Non-normal distributed continuous variables between 2 groups were analyzed using a Wilcoxon rank sum test.For multiple group comparison, an ANOVA analysis was first performed, followed by an emmeans test for the indicated comparisons using rstatix package v0.7.1.Mutational co-occurrence was calculated as pairwise odds ratio (OR) for any given gene between patients with the respective cohesin-MT and WT cases.Statistical significance was derived from Fisher exact test, which was adjusted for multiple testing using Benjamini-Hochberg procedure.Outcome analyses were carried out using the Kaplan Meier method stratified by the presence or absence of a respective mutation.Statistical comparison was conducted using Cox proportional hazard ratio (HR) (coxph, survival package v3.2-11) and a twosided log rank test with default parameters using survfit.For multivariate modeling, only significant univariate parameters were used forward in a Cox proportional model, using transplantation as a time-dependent variable.Unless stated otherwise, only results that hold significance in the merged datasets were reported in the main text; individual cohorts are presented in the supplement.

Cohesin-mutant hematologic malignancies have distinct disease characteristics
We investigated 2 large cohorts of patients diagnosed with a hematologic malignancy for presence of a mutation in the cohesin complex (Supplementary Fig. 1).In total, we identified 790 patients with a pathogenic mutation in any of the cohesin complex genes (Fig. 1A).The incidence of cohesin mutation was 10% in MDS, 5% in MDS/MPN, and 8% in AML patients in the DFCI cohort (Supplementary Table 2).Mutations in different cohesin subunits were noted to be mutually exclusive with each other, with only 12/790 (2%) cases characterized by mutations in more than 1 cohesin subunit, and they were spread throughout the coding sequence without any hotspots (Fig. 1B and Supplementary Fig. 2).STAG2 mutations were the most common and present in 610 (77%) of patients, followed by RAD21 in 104 cases (13%), SMC3 in 26 cases (3%), SMC1A in 22 cases (3%), and PDS5B in 16 cases (2%).Frameshift indel mutations were the most frequent type of mutations for most cohesin genes except for STAG2, where nonsense mutations leading to a premature stop codon comprised more than 50% of all mutations (Supplementary Fig. 3A).Patients with mutations in the cohesin complex were diagnosed with AML, MDS, and MDS/MPN in 374, 351, and 63 cases, respectively (Table 1, Supplementary Fig. 3B).In the DFCI cohort, we identified 55 patients (17.7%) with a pathologistvalidated non-myeloid hematologic malignancy, including indolent and high-grade lymphomas (chronic lymphocytic leukemia (n = 7), diffuse large B cell lymphoma (n = 7), multiple myeloma (n = 3), and acute lymphoblastic leukemia (n = 8)) (Supplementary Fig. 3C).In all subsequent analyses, we excluded cases with nonmyeloid hematologic malignancies and focused on patients with AML, MDS, and MDS/MPN only.
Patients were divided into a cohesin-WT (n = 4487) or cohesin-MT (n = 735) cohort.We then systematically compared the clinical and demographic features of these 2 cohorts.Cohesin-MT patients were older at the time of diagnosis (AML: 69 vs. 65 years; MDS: 73 vs. 69 years; p < 0.0001) and had different patterns of AML and MDS subtypes than their cohesin-WT counterparts.AML with myelodysplasia-related defining genetic abnormalities (AML-MR) was present in 73% of cohesin-MT compared to 34% of cohesin-WT cases (p < 0.001).Conversely, AML without genetically defined lesions (summarized as AML by differentiation) and AML with NPM1 mutation were more common among cohesin-WT than cohesin-MT cases (31% vs. 8.3%, p < 0.001; 14 vs. 11%, p = 0.14, respectively).Within MDS, the cohesin-MT cohort had a higher fraction of more advanced MDS diagnoses than the cohesin-WT cohort (MDS-IB1: 35% vs. 16%, p < 0.0001; MDS-IB2: 36% vs. 17%, p < 0.0001).Consistent with these findings, the fraction of patients with documented progression from MDS to AML was higher in MDS patients with cohesin mutations than MDS patients without these mutations (32% vs. 21%, p = 0.005; data only available for the DFCI cohort).Notably, MDS with bi-allelic TP53 inactivation, del5q, and SF3B1 associated MDS were nearly mutually exclusive with cohesin-mutant MDS (Table 1, Supplementary Table 2).These data demonstrate that cohesin mutations segregate with distinct clinical features linked to MDS and subsequent secondary AML.
Consistently, RAD21 and SMC1A/SMC3/PDS5B mutations represented a significantly greater proportion of cohesin-mutant AML than MDS as compared to STAG2 mutations (Fig. 1C).This suggests that STAG2 mutations tend to be acquired at the MDS stage, and  RAD21 and the other cohesin subunit mutations may be more likely acquired at the AML stage and lead to rapid leukemic transformation rather than a slower increase in blast count over time, as may be expected in MDS.Indeed, patients with RAD21 and SMC1A/SMC3/PDS5B mutations trended towards a higher percentage of blasts in their diagnostic AML bone marrow biopsy compared to patients with STAG2 mutations (median morphologydefined blast count of 47% for RAD21 vs. 28% for STAG2-mutant AML, p = 0.11, Supplementary Fig. 3F, data available for the DFCI cohort only).
Our genetic and cytogenetic analyses supported the hypothesis that STAG2-and non-STAG2-mutant myeloid diseases represent distinct biology and ontogeny.To investigate whether this was supported by distinct gene expression programs, we analyzed transcriptomic profiles of the cohesin-mutant AML cases in the BeatAML cohort [36].Using unsupervised k-means clustering, we observed distinct transcriptional profiles of STAG2-and RAD21/ SMC3/SMC1A-mutant cases (Fig. 3C).Gene set enrichment analysis (GSEA) highlighted differential expression of viral response and interferon signaling, as well as metabolic programs and extracellular matrix-associated pathways (Supplementary Fig. 8).In summary, the distinct co-mutational, cytogenetic and molecular landscapes of different cohesin mutations suggest unique patterns of disease development driven by different cohesin mutations.

STAG2 mutations have prognostic impact in MDS and AML
Having established unique disease and genetic characteristics for different cohesin mutations, we next assessed their impact on clinical outcomes.We conducted independent analyses of overall survival (OS) and progression free survival (PFS) in MDS and AML.The median follow-up time for the entire patient cohort was 73.8 months (95% confidence interval (CI) = 69.6-80.5 months) for MDS, and 49.4 months (95% CI = 45.4-54.4months) for AML.We first compared outcomes for STAG2-mutant MDS to cohesin-WT MDS in which STAG2 conferred a poor risk at a median OS of 30.3 versus 58.9 months (HR: 1.44, 95% CI 1.17-1.78,p < 0.001, combined cohort, Supplementary Fig. 9).Given our observations of near-mutual exclusivity of cohesin and TP53 mutations (Figs. 2 and 3A), and the well-established association of TP53 mutations with poor outcomes [38,39], we next compared the OS and AML-free survival of patients with STAG2-mutant MDS to TP53-mutant MDS and cohesin/TP53-WT MDS.We observed a significantly worse OS of STAG2-mutant MDS compared to cohesin/TP53-WT MDS (HR = 1.73, 95% CI = 1.4-2.14, median OS 30.3 vs. 69.8months, p < 0.001, Fig. 4A), and a similar risk of leukemic transformation in STAG2-and TP53-mutant MDS cases (median AML-PFS of 15.4 months for STAG2 and 12.1 months for TP53, p = 0.3, DFCI cohort only, Fig. 4B).In a multivariable regression analysis to ascertain the effect of mutations, cytogenetics, diagnostic blast count, and age at the time of MDS diagnosis, the presence of a STAG2 mutation did not reach significance as an independent predictor of MDS outcome (Fig. 4E).
For our outcome analysis in AML, we first compared STAG2mutant AML to cohesin-WT cases separated into cohesin-WT AML associated with myelodysplasia-related changes (thereafter referred to as "AML-MR") and cohesin-WT AML not associated with myelodysplasia-related changes (thereafter referred to as "AML-non-MR") according to the WHO 2022 classification.We observed that STAG2-mutant AML had significantly worse OS than AML-non-MR (HR = 0.62, 95% CI = 0.5-0.76,median OS 16 vs.35 months, p < 0.001), and only a modestly better OS than AML-MR (HR = 1.43, 95% CI = 1.16-1.77,median OS 10.3, p < 0.001, Fig. 4C).Given the near-mutual exclusivity of STAG2 and TP53 mutations in AML, we performed a subset analysis of cohesin-WT AML excluding TP53-mutant cases which removed most of the differences and showed a very similar and numerically even favorable outcome between STAG2-mutant and AML-MR without TP53 mutations.(HR = 0.80, 95% CI = 0.64-0.99,median OS 13.6 vs. 11.8 months, p = 0.04, Fig. 4D).This poor outcome was also evident for the rare STAG2-mutant cases that were not diagnosed as AML-MR because of competing classifying mutations (e.g., NPM1 and/or CEPBA, Supplementary Fig. 10A, B).
Importantly, this pattern was distinct from the outcomes of RAD21-mutant AML, which was almost identical to AML-non-MR and significantly better than STAG2-mutant AML OS (HR = 0.56, 95% CI = 0.34-0.93,median OS 48 vs. 16 months p = 0.024) (Fig. 4C, Supplementary Fig. 11A, B).This effect was most apparent in the DFCI cohort, although we observed the same trend in the MLL cohort, with differences likely being driven by intrinsic variability in treatment and selection biases between DFCI and MLL (Supplementary Fig. 11C, D).Allogenic stem cell transplantation cases accounted for 41% of DFCI but only 12% of MLL cases (Supplementary Table 5), and response rates to induction therapy (Supplementary Table 6) were similar between groups.The effects of STAG2 and RAD21 mutations on OS remained significant when censored for allogeneic stem cell transplantation (Supplementary Fig. 10B, Supplementary Fig. 12), although neither one reached statistical significance as an independent predictor of outcome in a multivariable regression analysis of known clinical co-variables (age and transplantation as time dependent variables) and comutation with ASXL1, SRSF2, RUNX1 and TP53 (Fig. 4F).
In summary, our findings suggest that only STAG2 mutations confer a negative impact on AML outcomes, which is attributed to secondary ontogeny and a genetic makeup of preceding myeloid dysplasia.Notably, the prognostic impact of RAD21 mutations is shared with de novo AML.

DISCUSSION
Our study establishes a role for different cohesin subunit mutations in distinct subtypes of MDS and AML, which has significant prognostic implications, expands our current understanding of this important group of driver genes, and informs unique biology of different cohesin subunits.We assembled and analyzed the largest existing cohort of 790 patients with cohesinmutant hematologic malignancies and demonstrated that mutations in STAG2 and RAD21 shape the presentation and outcome of AML in unique ways, which can be explained by distinct comutational patterns and AML ontogeny.Furthermore, the size of our cohort strongly supports this prognostic impact to be driven by disease ontogeny in both MDS and AML, which was underappreciated in significantly smaller cohorts [23,24].We demonstrated that STAG2 mutations are associated with secondary AML ontogeny, are usually acquired at MDS or MDS/MPN stage, and co-occur with other secondary ontogeny-defining mutations, such as ASXL1, SRSF2, and RUNX1.Our data are in agreement with initial reports identifying STAG2 as one of the eight secondary AML ontogeny defining lesions [21], as well as the 2022 International Consensus Classification which uses STAG2 as an AML-MR defining mutation for classification of AML [40,41].In contrast, we found that RAD21 mutations are associated with de novo AML, are rarely preceded by MDS or MDS/MPN, and are associated with de novo or pan AML molecular abnormalities, such as t(8;21), FLT3, and NPM1 mutations [24].These differences are also reflected in distinct gene expression patterns between STAG2-and non-STAG2-mutant AML, and the unique co-mutational and cytogenetic patterns likely contribute to distinct biological trajectories of leukemic evolution and warrant further investigation in preclinical models.
Importantly, clinical outcomes reflect the different ontogeny associated with STAG2 and non-STAG2 cohesin mutations, including RAD21, SMC1A, and SMC3 mutations.While we did not find them to be independent prognostic markers, the distinct pattern of outcomes is reflective of their different disease ontogeny.We observed that STAG2 mutations conferred overall survival similar to AML-MR, while RAD21-mutant cases displayed overall survival similar to AML-non-MR cases.In addition, SMC1A and SMC3-mutant cases similarly share clinical and molecular features with AML-non-MR.We therefore propose that RAD21, SMC1A, and SMC3-mutant AML should be considered as AML-non-MR.Our data demonstrate that within the family of cohesin complex mutations, only STAG2 mutations are indicative of secondary ontogeny and are associated with worse clinical outcomes.
We observed that STAG2-mutant cases may have higher numbers of co-mutations (as determined by targeted sequencing  [21] and association with cohesin subunits.A 2% allelic frequency cut off was used.Overall AML survival (%)  panels, with its inherent limitations), which could be a clinical proxy of an intrinsically increased genomic instability.This is in line with several prior studies demonstrating that STAG2 deficiency is coupled with replication fork stalling, impaired DNA damage repair, and accumulation of DNA damage [14,[42][43][44].These findings have contributed to the therapeutic window for inhibitors of Poly(ADP-ribose)polymerase (PARP) [14], which are currently being investigated in a proof of concept study of single agent and combination treatment with hypomethylating agents in a clinical trial for cohesin-mutant AML and MDS (Clinicaltrials.govidentifier NCT03974217).Furthermore, the association of STAG2mutations with trisomy 8 is intriguing considering recent findings suggesting that RAD21 is the driver of chromosome 8 gain to mitigate replication stress in Ewing sarcoma [45], a disease characterized by frequent STAG2 mutations.Currently, our data does not allow us to predict the order of STAG2 and trisomy 8 acquisition, or whether trisomy 8 may affect response to DNA damage repair inhibitors or replication fork stressors, such as PARP inhibitors or hydroxyurea.
The strengths of our study include the large, well-annotated patient cohorts that were representative of clinical practice in Europe and the United States, although such retrospective analyses have an inherent selection bias found in both cohorts to different extents.We therefore note that the retrospective nature of this approach limits the generalizability of our results.Furthermore, the numbers of SMC1A and SMC3-mutant cases in our cohort were significantly smaller than the number of STAG2 and RAD21-mutant cases, which may limit some of our conclusions about mutations in these cohesin subunits.
In summary, our study contributes to a better understanding of the distinct effects of cohesin gene mutations in myeloid malignancies.Although the biology underlying these differences is not yet known, our work supports the notion that not all cohesin subunit mutations were created equal and that the distinct pattern of cohesin mutations across cancer types may be driven by the unique biology of cohesin subunits.

Fig. 1
Fig. 1 Molecular characterization of cohesin mutations in hematologic malignancies demonstrates subunit-specific differences.A Oncoprint of all patients with cohesin mutations (DFCI and MLL cohorts combined), n = 790.B Lollipop plot panel of cohesin mutations for the combined cohort.C Pie charts of distribution of cohesin mutations across MDS, AML, and MDS/MPN for the combined cohort.D Box plot of the total number of pathogenic mutations identified by targeted sequencing of patients with cohesin-mutant MDS and AML at the time of diagnosis, stratified by cohesin status for the combined cohort.Wilcoxon test was used to determine significance.

Fig. 3
Fig.3Mutations in different cohesin subunits display unique mutational and chromosomal abnormality characteristics.Balloon plot for relative enrichment of co-occurrence of cohesin subunit mutations with other myeloid driver mutations (A) and chromosomal aberrations (B).Cohesin-WT cohort was used as reference to calculate enrichment, which is indicated as log2 odds ratio (OR).Combinations with q < 0.05 or 5% mutational frequency in the total cohort are shown in (A).Missing balloons indicate OR = 0. False discovery rate (FDR) is as indicated: ***<0.0001,**<0.001,*<0.05 and corresponds to dot size.C Gene expression heatmap of cohesin mutant patients with annotated cohesin mutations and clustered by k-means (1328 genes).Samples were clustered by hierarchical clustering and annotated by cohesin mutation.Color indicate z score transformed CPM per gene.

Fig. 4
Fig.4Prognostic impact of STAG2 mutations in MDS and AML as secondary ontogeny mutations.Survival analysis using the Kaplan-Meier method and log-rank test for overall survival in MDS (A), AML-Progression free survival in MDS (DFCI cohort only) (B), and AML survival stratified by cohesin subunit mutational status and cohesin-WT group by AML MR or AML-non-MR (C) and TP53 mutation (D).HR = Hazard ratio.Statistical significance was determined using the log rank-test.E Forrest plot for multivariate prognostic impact of STAG2 mutations for MDS OS and (F) for AML OS.

Table 1 .
Patient and disease characteristics of cohesin-mutant versus cohesin-wild type MDS, MDS/MPN, and AML patients.
b DFCI cohort only.

Table 2 .
Patient and disease characteristics of different cohesin complex mutations in MDS, MDS/MPN, and AML patients.