Epigenomic mapping reveals distinct B cell acute lymphoblastic leukemia chromatin architectures and regulators

Summary B cell lineage acute lymphoblastic leukemia (B-ALL) is composed of diverse molecular subtypes, and while transcriptional and DNA methylation profiling has been extensively examined, the chromatin landscape is not well characterized for many subtypes. We therefore mapped chromatin accessibility using ATAC-seq in primary B-ALL cells from 156 patients spanning ten molecular subtypes and present this dataset as a resource. Differential chromatin accessibility and transcription factor (TF) footprint profiling were employed and identified B-ALL cell of origin, TF-target gene interactions enriched in B-ALL, and key TFs associated with accessible chromatin sites preferentially active in B-ALL. We further identified over 20% of accessible chromatin sites exhibiting strong subtype enrichment and candidate TFs that maintain subtype-specific chromatin architectures. Over 9,000 genetic variants were uncovered, contributing to variability in chromatin accessibility among patient samples. Our data suggest that distinct chromatin architectures are driven by diverse TFs and inherited genetic variants that promote unique gene-regulatory networks.


In brief
Barnett et al. profile the chromatinaccessibility landscape across B cell lineage acute lymphoblastic leukemia (B-ALL) patient samples, identifying differences among individual patients, B-ALL subtypes, and normal B cell progenitors.Utilization of transcription factor (TF) footprint profiling and promoter capture Hi-C yielded TFbinding patterns and TF-gene associations that define B-ALL.

INTRODUCTION
Acute lymphoblastic leukemia (ALL) is derived from B and T cell lineage precursor cells and is the most common childhood cancer. 1 A majority of acute lymphoblastic leukemias are derived from B cell lineages (B-ALL) that are comprised of distinct molecular subtypes characterized by unique chromosomal lesions, including aneuploidy, translocations, gene fusions, point mutations, and other chromosomal rearrangements that drive leukemogenesis. 2Numerous studies have identified extensive heterogeneity in transcriptomes 3,4 and DNA methylomes 5,6 among B-ALL subtypes in large patient cohorts, but there is limited un-derstanding of chromatin landscapes.Here we provide an extensive survey of accessible chromatin state and cis-regulatory element activity in primary B-ALL cells from more than 150 patients across the United States.
Chromatin accessibility or open chromatin is a hallmark of active cis-regulatory elements that control spatial and temporal gene expression. 7Because ALL typically involves mutations (PAX5-altered), complex rearrangements (e.g., DUX4-rearranged, PAX5-altered, ZNF384-rearranged), and/or oncogenic gene fusions (e.g., ETV6::RUNX1, TCF3::PBX1, KMT2A-rearranged) of transcription factor (TF) genes, as well as disruptions of cis-regulatory elements, 8  (legend continued on next page) 2 Cell Genomics 3, 100442, December 13, 2023 Resource provide valuable information to better understand the leukemogenic process.Accessible chromatin sites can be mapped using transposases by performing assays for transposase-accessible chromatin with high-throughput sequencing (ATAC-seq). 9,10lthough DNase treatment has also been used, 11 one key advantage of ATAC-seq is the low sample input requirements compared to DNase-based assays.This makes ATAC-seq an attractive assay for mapping open chromatin in primary cells from patients wherein sample availability is limited.Additionally, chromatin accessibility allows for identification of bound TFs through an examination of TF footprints, which are defined by depletion in DNA transposition 12 or DNase 13 cleavage events within regions of accessible chromatin signal.As a result, the underlying TF-binding gene-regulatory networks that promote chromatin accessibility and differential gene expression can be predicted.
Using ATAC-seq chromatin-accessibility and histone profiling in primary ALL cells, we mapped cis-regulatory element activity in B-ALL.In complement to chromatin-accessibility profiling, we identified thousands of chromatin loops targeting promoters in multiple B-ALL cell lines to better inform linkages of cis-regulatory elements to cognate genes.We coupled these maps to TF footprints at accessible chromatin sites to identify key TFs and gene-regulatory networks across B-ALL samples and within distinct B-ALL subtypes.Our results identified extensive chromatin reprogramming between B cell progenitors and B-ALL as well as extensive heterogeneity in accessible chromatin landscapes among B-ALL subtypes.Specifically, we uncovered a focused subset of over 42,000 B-ALL open chromatin sites exhibiting extensive subtype enrichment and subtype-enriched TF-binding events.Notably, these sites can predict and classify B-ALL samples with 86% cross-validation accuracy.We additionally explored the impact of inherited genetic variation on the chromatin state and delineated over 9,000 ATAC-seq chromatin-accessibility quantitative trait loci (ATAC-QTLs) in B-ALL cells, including a subset that alters neighboring gene expression.Using this expansive B-ALL chromatin-accessibility dataset, our data collectively support substantial subtype specificity in chromatin accessibility that is driven in part by distinct TFs, as well as pronounced inter-individual heterogeneity in chromatin state through inherited genetic variants.Our work further supports the role of these distinct chromatin architectures in establishing unique gene-regulatory networks that impact gene expression and B-ALL cell biology.

RESULTS
Chromatin-accessibility profiles of B-ALL patient samples spanning multiple subtypes ATAC-seq using the Fast-ATAC 10 method was performed on recently harvested primary ALL cells from 156 patients spanning ten B-ALL molecular subtypes (BCR::ABL1, DUX4-rearranged, ETV6::RUNX1, high hyperdiploid, low hypodiploid, KMT2A-rearranged, Ph-like, PAX5-altered, TCF3::PBX1, and ZNF384-rearranged) and B-other samples (Table S1) from diverse medical centers, research groups, and clinical trials networks across the United States (see STAR Methods).To identify high-confidence sites, we identified ATAC-seq peak summits using subtype-merged data and selected only loci reproducible among unmerged individual patients.Using this approach, we identified 110,468 accessible chromatin sites, on average, in each B-ALL subtype (range 71,797-142,498), with 217,240 merged sites identified in total representing the final B-ALL genomic regions of interest (Figure 1A; Table S2).
In most cases, these candidate cis-regulatory elements map within intergenic or intragenic loci with unclear gene targets.
Therefore, to better inform gene connectivity, we produced chromatin looping data using promoter capture Hi-C 21 across ten patient samples (BCR-ABL1, ETV6-RUNX1, KMT2A-rearranged, Ph-like, TCF3-PBX1, and B-other subtypes) and seven B-ALL cell lines (697, BALL1, Nalm6, REH, RS4;11, SEM, and SUP-B15) to complement B-ALL patient chromatin-accessibility profiles.Collectively, across patient samples and B-ALL cell lines we detected approximately 300,000 chromatin loops, with approximately 50% of the 217,240 chromatin-accessible regions of interest intersecting with a promoter loop, including 15,929 chromatin-accessible sites that looped to a cancer-implicated gene set (Figure 1C). 22,23In many instances, large domains of extensive chromatin looping are present, which with chromatin accessibility and active histone marks emphasize the gene-regulatory networks present across B-ALL patient samples (e.g., Figures 1D and 1E).

Chromatin accessibility identifies Pro-B cell of origin for most B-ALL patient samples
To better understand chromatin remodeling during leukemogenesis, we sought a comparison of chromatin accessibility between B-ALL and B cell progenitors.Moreover, although it is widely accepted that the B-ALL cell of origin is a B cell precursor, exactly which precursor is not always clear, particularly at the chromatin-accessibility level. 24To resolve this uncertainty, we examined publicly available ATAC-seq data from several human B cell progenitors 10,25 (Figure 2A).When comparing chromatinaccessibility signal between B cell progenitor groups, we identified a set of approximately 42,344 genomic loci that demonstrate a chromatin-accessibility enrichment or depletion trend for a B cell progenitor (Figure 2B; Table S3; see also STAR Methods).We refer to these chromatin loci as B-progenitor identity loci, due to their distinct patterning across B-progenitor differentiation and likely representation of stage-specific gene-regulatory programs.
Next, we examined patient B-ALL cell chromatin accessibility across these B-progenitor identity loci.When plotting chromatin-accessibility signal as a heatmap comparing B cell progenitors and B-ALL patient samples, a high degree of similarity was observed with prePro-B cells and Pro-B cells (Figure 2B).Further, when applying the k-nearest neighbor classification model previously trained on B-progenitor identity loci, the majority of B-ALL samples were classified as either prePro-B or Pro-B (Figures 2C and 2D).However, prePro-B cells have been reported to be an extremely rare population beyond embryonic and fetal development. 25Overall, Pro-B cells demonstrate the most similarity to B-ALL cells at the chromatin-accessibility level when focusing specifically on B cell precursor defining loci, emphasizing this precursor B cell as a common cell of origin for B-ALL.

Extensive differences in chromatin accessibility between B-ALL and Pro-B cells
To better understand chromatin remodeling during leukemogenesis, we next compared accessible chromatin sites between B-ALL and Pro-B cells (n = 3) using DESeq2 at the 217,240 merged B-ALL chromatin-accessibility peaks and uncovered 42,661 differentially accessible chromatin sites (DASs) exhibiting lesser or greater accessibility in B-ALL samples (Figures 3A, 3B, and S1; Table S4).Ontology analysis focusing strictly on DASs with higher chromatin accessibility in B-ALL indicated an enrichment for sites associated with genes involved with Toll-like receptor signaling, interleukin production, metabolism (acetylcoenzyme A [CoA] production), and cell proliferation (Figure 3C).Enriched ontology terms were frequently present at multiple foldchange thresholds of input B-ALL DASs (Table S5).
In addition to profiling differential chromatin accessibility, global TF binding was also compared between B-ALL and Pro-B cells.To identify differential TF binding, we performed genome-wide TF footprint profiling 12 using 810 TF motifs comparing B-ALL patient samples and normal Pro-B cell samples across all B-ALL genomic regions of interest (217,240 regions).Differential binding scores indicated the AP-1 family of TFs (e.g., FOS, JUN) as the most prominent TFs with higher binding in B-ALL patient samples compared to normal Pro-B cells (Figure 3D).In contrast, prominent TFs with higher binding in Pro-B cells were those such as TFAP2A, KLF15, CTCFL, ZBTB14, and EBF1.
To further demonstrate AP-1 TF occupancy in B-ALL accessible chromatin sites, we performed CUT&RUN for FOSL2, JUN, and JUNB in 697 and SUB15 human B-ALL cell lines (Figures 3E and S2).Collectively, we identified 56,650 (697: FOSL2 = 66,040, JUN = 47,381, JUNB = 4,730) and 61,121 (SUPB15: FOSL2 = 106,961, JUN = 9,971, JUNB = 14,784) AP-1 binding sites in 697 and SUPB15 cells, respectively, with a final merged AP-1 region set including both 697 and SUPB15 regions of 88,650.Intersections with B-ALL accessible chromatin sites from primary cells using the merged AP-1 regions identified that 28% (61,090) of these sites were occupied by an AP-1 TF in B-ALL cell lines (41,002 sites and 19% of all B-ALL accessible chromatin sites from primary cells in 697; 45,685 sites and 21% of all B-ALL accessible chromatin sites from primary cells in SUPB15).Strikingly, our results further uncovered that 46% of DASs with higher chromatin accessibility in B-ALL (i.e., B-ALLenriched DASs) also exhibit AP-1 TF occupancy (Figure 3F), thereby supporting the activation of AP-1 TF-associated cis-regulatory in B-ALL.We determined that even though most AP-1occupied B-ALL-enriched DASs localized to promoter-distal regions of the human genome (77%), there is a 2.7-fold enrichment for AP-1 occupancy at B-ALL-enriched promoters compared to B-ALL-enriched DASs devoid of AP-1 occupancy (Figure 3G; 16% vs. 6%).Further integration of AP-1-occupied B-ALLenriched DASs with promoter capture Hi-C in B-ALL cell lines identified target genes that were enriched for cell cycle, autophagy, and apoptotic signaling pathways (Table S6; example in Figure 3H).Select B-ALL-enriched, promoter-distal DASs predicted to be AP-1 bound by TF footprinting within B-ALL patient samples but not Pro-B cells were targeted for CRISPR-Cas9mediated genomic deletion in B-ALL cell lines (Figure S3).Validating their role as B-ALL cis-regulatory elements, analysis of heterogeneous deletion pools identified effects on neighboring gene expression and cellular proliferation (Figure S4).
As an extension of our TF footprinting data, we also integrated B-ALL patient promoter capture Hi-C using the ABC enhancer algorithm to refine identification of TF-target gene relationships across top TFs and a cancer-implicated gene set. 26 we focused on top TF footprints within B-ALL-enriched DASs and the cancer-implicated gene targets of these DASs predicted by the ABC enhancer algorithm.Concordant with global TF footprint and AP-1 TF occupancy analyses, we identified the AP-1 family as top TFs in this network.We also identified other top TFs from TF footprinting, such as CEBP family TFs and BACH2 (Figure 3I).Other prominent top TFs include NFIC, XBP1, TBX2, and numerous basic helix-loop-helix (bHLH) class TFs (e.g., MYOG, MYF5 and HES5).Top expressed cancer-implicated gene targets for each TF converged on notable genes involved in cell signaling (SMAD3, PTPRJ), BCL6 regulation (FBXO11, BCOR), and transcriptional regulation (RUNX1, ERG, CUX1) (Figure 3I).Largely consistent results were further obtained using promoter capture Hi-C data from B-ALL cell lines (Figure S5).Collectively, these results highlight alterations of signaling pathways and TF-binding networks that facilitate the proliferative potential of B-ALL samples.

Identification of subtype-enriched chromatin architecture
To better understand chromatin accessibility within B-ALL, intersubtype analyses were performed to identify DASs exhibiting subtype-enriched signal (henceforth referred to as subtype-enriched DASs) in ten B-ALL molecular subtypes harboring known molecular drivers (BCR::ABL1, DUX4-rearranged, ETV6:: RUNX1, high hyperdiploid, low hypodiploid, KMT2A-rearranged, Ph-like, PAX5-altered, TCF3::PBX1, and ZNF384-rearranged; Figures 4A-4C).For this analysis, we compared a single B-ALL subtype cohort with all other B-ALL cell samples not belonging to that subtype in pairwise fashion covering all subtypes using DESeq2 differential analysis across the 217,240 B-ALL accessible chromatin sites from primary cells.This approach was utilized to emphasize high degrees of subtype enrichment compared to the full spectrum of chromatin-accessibility variability in the remaining sample cohort.We identified between 452 and 10,590 DASs in each B-ALL subtype, with a total of 42,753 subtype-enriched DASs identified across all ten B-ALL subtypes (log 2 fold change >1 or <1, false discovery rate [FDR] < 0.05; Figure 4B; Table S7).We annotated subtype-enriched DASs on a subtype basis and determined that a majority of subtype-enriched DASs in each B-ALL subtype (87%, range 80%-90%) localized to promoter-distal regions of the genome (intronic and distal intergenic; Figure S6) and 43%, on average (range 39%-49%), localized to distal intergenic regions, thereby emphasizing the importance of non-genic loci in defining B-ALL chromatin heterogeneity.
To further determine functional effects on gene expression, we integrated subtype-enriched DASs with differentially expressed genes (DEGs) uniquely upregulated (log 2 fold change > 1, FDR < 0.05) in each of the ten B-ALL molecular subtypes to determine whether they were enriched near DEGs.We identified a statistically significant enrichment of subtype-enriched DASs near upregulated DEGs in nine of ten subtypes compared to total expressed genes in the corresponding subtype and uncovered a strong statistical trend in Ph-like B-ALL (Kolmogorov-Smirnov test, p = 0.053; Figure S8).We additionally selected several top differential DAS genomic regions for targeting with the CRISPR interference (CRISPRi) dCas9-KRAB repressor as a test of their effects on neighboring gene expression (Figures 4D and S9).Putative cis-regulatory elements were targeted within a B-ALL cell line context corresponding to the origin of the B-ALL subtype DASs (Nalm6 = DUX4-rearranged, SEM = KMT2A-rearranged).These select DAS regions each demonstrated repression of the corresponding nearby genes (LNX1, MAP7, SENP6) when targeted with a dCas9-KRAB repressor, indicating a genuine cis-regulatory element (Figure 4D).Consequently, these data support the role of subtype-enriched DASs in gene regulation and gene activation and further suggest that differences in chromatin accessibility contribute to transcriptomic differences among B-ALL subtypes. 3,4Collectively, these results highlight extensive open chromatin heterogeneity among B-ALL molecular subtypes.

Mapping transcription factor drivers and generegulatory networks in B-ALL subtypes
We performed TF footprint profiling for 810 TF motifs across all B-ALL chromatin-accessibility sites (N = 217,240) using merged ATAC-seq signal from ten B-ALL subtypes with known molecular drivers to identify subtype-enriched TF drivers.TF footprint profiling 12 identified between 4,303,155 and 5,441,937 bound motifs in each B-ALL subtype, with 49,402,067 TF footprints at 815,992 unique genomic loci identified across all subtypes.Using these data, we next identified key TF footprints that were enriched in each subtype (i.e., subtype-enriched TF footprints) by calculating differential footprint scores between every subtypesubtype pair for each TF motif.The top median differential motif scores for each subtype were selected as subtype-enriched TF Seventy-two hours after doxycycline (100 ng/mL) induction of SEM (KMT2A-rearranged) and NALM6 (DUX4-rearranged), B-ALL cell lines expressing doxycycline-inducible dCas9-KRAB and transduced with negative control single-guide RNAs (sgRNAs) (non-coding and non-targeting) or sgRNAs targeting subtypeenriched DASs (Enh1 and Enh2) are shown.Gene expression is normalized to the average of the two control sgRNAs (error bars denote ±standard error of the mean).Significance was calculated by a two-sample t test of combined biological replicates for both control sgRNAs versus both DAS-targeting sgRNAs.**p < 0.01; ***p < 0.001.footprints.This approach was utilized to emphasize differential TF footprint motifs that were consistent and distinct for each subtype rather than repetitive global trends (Figure 5A).Notably, subtype-enriched TF footprints were identified for recognized TF drivers such as DUX4 in DUX4-rearranged ALL and ZNF384 in ZNF384-rearranged ALL.We also identified HOX family TFs (HOXA9, HOXB9, HOXC9, and HOXD9) in KMT2A-rearranged ALL, GATA family TFs (GATA2, GATA3, GATA4, GATA5, and GATA6) in ZNF384-rearranged ALL, and nuclear receptor family TFs (ESR1, ESR2, RARA, and THRB) in PAX5-altered ALL that all had strong subtype-enriched TF footprints.
Because DNA consensus motifs can be highly redundant within TF families, we integrated subtype-enriched TF footprints with DEGs uniquely upregulated in each subtype to identify candidate TFs from these TF families that are upregulated in the corresponding B-ALL subtype.This analysis identified HOXA9 and HOXC9, RARA, and GATA3 as upregulated genes in KMT2A-rearranged, PAX5-altered, and ZNF384-rearranged subtypes, respectively (Figure 5B).In addition, DUX4 (DUX4-rearranged) and MEIS1 (KMT2A-rearranged) were also identified as upregulated TF genes with subtype-enriched TF footprints (Figure S10).
To determine whether these upregulated TFs promote unique chromatin-accessibility landscapes among B-ALL subtypes, we also performed TF footprinting on subtype-enriched DASs by comparing differential footprint scores between each B-ALL subtype group and non-subtype patient sample group (Figures 5C and S11).These data also supported a role of DUX4 in DUX4-rearranged ALL, ZNF384 and GATA3 in ZNF384-rearranged ALL, and HOXA9 and MEIS1 in KMT2A-rearranged ALL in the generation of subtype-specific chromatin landscapes (Figures 5C and S11).Additionally, our subtype versus non-subtype findings for ETV6::RUNX1 further confirm the importance of factors binding GGAA DNA-sequence repeats (EWSR1-FLI1, MA0149.1) as previously characterized for the ETV6::RUNX1 subtype 27 (Figure S11).In complement to analysis of subtypeenriched DASs, we also examined TF footprints among subtype-depleted DASs by again comparing each B-ALL subtype group and non-subtype group.Transcriptional repressors such as ZNF135, ZNF263, ZEB1, and ZEB2 had higher footprint scores across subtype-depleted DASs for multiple subtypes, suggesting a common set of TFs promoting subtype-depleted DASs (Figure 5C and Data S1).

Predictive potential of B-ALL subtype-enriched DASs
We determined how well chromatin accessibility can predict B-ALL subtypes by constructing a stepwise principal component analysis-linear discriminant analysis (PCA-LDA) classification model using the 42,753 subtype-enriched DAS ATAC-seq read count matrix as initial input across ten B-ALL subtypes harboring known molecular drivers (outlined in Figure 6A).Notably, the constructed classification model was tested with leave-one-out cross-validation at an accuracy of 89%.The most common failure was incorrect classification of BCR::ABL1 and Ph-like subtypes (Figure 6B), as has been observed with other ALL classification algorithms. 28Taking this into account by grouping BCR::ABL1 and Ph-like subtype samples into a common class yielded a recalculated cross-validation accuracy of 91%.Visualization of B-ALL subtype separations using select dimensions output by the LDA model demonstrates distinct groupings of related subtypes emphasizing classification-model performance (Figure 6C).
As a further validation of our classification model, we applied the classification algorithm to a separate ATAC-seq validation cohort comprising 24 B-ALL patient samples of known subtype covering ETV6::RUNX1, DUX4-rearranged and hyperdiploid subtypes from our previous work. 29The classification model was able to assign the correct subtype with 100% accuracy among the 24 B-ALL patient samples in the validation set.Classification-model performance was visualized by projecting the validation cohort onto the original training model LDA dimensions, yielding a distinct clustering of training samples with validation samples (Figure 6D).Collectively, these data support the utility of chromatin structure and subtype-enriched DASs in B-ALL subtype classification.

Mapping inherited DNA-sequence variants that impact chromatin accessibility
To determine how germline variation impacts chromatin accessibility, we identified chromatin-accessibility ATAC-QTLs in a subset of 69 patient samples with available SNP genotyping information and allele-specific ATAC-seq read counting using RASQUAL. 30In total, 9,080 ATAC-QTLs were identified representing both directionalities, with reference or alternative alleles increasing chromatin accessibility (FDR < 0.1; Figure 7A; Table S8).Manual quantification and scaling of allele-specific read counts for select ATAC-QTLs identified with RASQUAL demonstrated a clear concordance and directionality among individual patient samples classified into genotype groups (Figure 7B).Visual inspection of merged read counts from patient samples grouped into reference allele homozygote, heterozygote, or alternative allele homozygote for select ATAC-QTLs further supports the high-quality nature of identified ATAC-QTLs (Figure 7C).We further determined that 218 ATAC-QTLs were also lead expression QTL (eQTL) SNPs when compared to gene-tissue expression (GTEx) eQTLs 31 from relevant tissues (blood and lymphoblastoid cells), with 85% also concordant for allele over-representation directionality (Figure 7D; Table S9).ATAC-QTLs were also compared with inherited genome-wide association study variants for ALL disease susceptibility, which identified rs3824662 (GATA3) 32 and rs17481869 (2p22.3) 33as ATAC-QTLs that were associated with the risk of developing B-ALL.Further supporting the validity of our methodology, rs3824662 was also identified as an ATAC-QTL in ALL PDX samples, 34 and we functionally validated differential allele-specific activity for rs17481869 in multiple B-ALL cell lines (Figure S11).
To infer the impact of TF binding in control of chromatin accessibility at ATAC-QTLs, we overlapped ATAC-QTL loci with TF motifs determined as TF-bound by footprint profiling. 12Nearly one-third (28.8%; 2,615/9,080 ATAC-QTLs) of these ATAC-QTLs overlapped a TF-bound motif footprint across multiple B-ALL subtypes, suggesting that most ATAC-QTLs do not have a clear TF-binding mechanism regarding how they impact chromatin accessibility.Analysis of bound TF motif footprint prevalence at ATAC-QTLs identified several ETS family TFs (EHF, ELF3, SPI1/PU.1, and SPIB), zinc-finger TFs (ZNF263, Cell Genomics 3, 100442, December 13, 2023 9 (legend continued on next page) ZNF460, ZNF740, and ZNF148), and CTCF as the most altered motifs leading to differences in chromatin accessibility between alleles (Figure 7E).6][37][38] Collectively, these data identify inherited DNAsequence variants contributing to chromatin heterogeneity among B-ALL subtypes and indicate specific TFs of interest for further exploration of ATAC-QTLs.

DISCUSSION
Our study provides a large-scale examination of chromatin accessibility in the B-ALL genome across an expansive set of B-ALL subtypes.We further integrated these data with ChIPseq histone modification enrichment in primary B-ALL cells and three-dimensional chromatin looping data using promoter capture Hi-C in multiple patient samples and B-ALL cell lines.
Our data demonstrate that most regions of chromatin accessibility harbor activating chromatin marks consistent with cis-regulatory elements involved in gene regulation, and we further confirmed direct looping to gene promoters for approximately 50% of accessible chromatin sites.However, this does not rule out more transient chromatin looping interactions difficult to detect by current chromatin conformation capture genomic techniques.
Extensive epigenomic reprogramming was uncovered between B cell progenitors and B-ALL, and cell-of-origin analyses identified Pro-B cells as the most common cell of origin.Our comparison of B-ALL and Pro-B cell chromatin accessibility suggests epigenomic reprogramming that is, in part, associated with AP-1 TF occupancy.We further identify disruptions to normal B cell function through the activation of Toll-like receptor signaling and interleukin production.Acetyl-CoA synthesis was also identified as an enriched gene ontology term when comparing B-ALL and Pro-B cells.Metabolic alterations in cancer are well known, particularly acetyl-CoA synthesis alterations, which have been previously reported in cancer. 39e further examined accessible chromatin landscapes among diverse molecular subtypes of B-ALL.Collectively, we identified 42,753 subtype-enriched DASs, which strikingly represent 20% of analyzed accessible chromatin sites across a pansubtype B-ALL genome.Subtype-enriched DASs were enriched near upregulated DEGs in the corresponding subtype, supporting their role in gene activation.Moreover, comparisons between subtype-enriched DASs and chromatin-accessibility data from cell lines identified largely consistent patterns.We further identified candidate TFs that exhibited strong subtype specificity through TF footprinting analyses and validated some of these findings using transcriptomic data from primary B-ALL cells.Collectively, these analyses highlighted a putative role for HOXA9 and MEIS1 in KMT2A-rearranged ALL, GATA3 in ZNF384-rearranged ALL, and RARA in PAX5-altered B-ALL.
We further confirmed the previously reported roles of DUX4 and ZNF384 in DUX4-rearranged and ZNF384-rearranged ALLs, respectively, and our more focused analysis of subtypeenriched DASs confirmed many of these TF hits.1][42] Our identification of numerous HOX TFs with enriched footprints in KMT2A-rearranged ALL is also consistent with observations of HOX gene dysregulation in this subtype. 43urther supporting our results, ZNF384 fusion proteins in ZNF384-rearranged ALL are known to upregulate GATA3 expression. 44,45Although a direct role for RARA in PAX5-altered B-ALL has not been established, previous work has identified PAX5 as a target gene of the PLZF-RARA fusion protein in acute promyelocytic leukemia. 46Moreover, both RARA and PAX5 genes can form fusions with PML in acute promyelocytic leukemia 47 and ALL, 48 respectively.While PAX5-altered ALL has not been well connected to RARA nuclear receptor signaling, there has been previous work treating IKZF1-mutated BCR-ABL1 ALL with RARA and RXR agonists that suppressed a selfrenewal phenotype. 49Collectively, these data warrant further investigation of RARA and RXR signaling in PAX5-altered ALL.
In addition to key subtype-enriched TF footprints identified within accessible chromatin sites, we also assessed subtypedepleted DASs and identified numerous TFs that have been shown to act as repressors, such as ZNF135, ZNF263, ZEB1, and ZEB2.Intriguingly, several studies have supported a role for these repressors in cancers such as neuroblastoma and AML demonstrating enhanced tumorigenic phenotypes. 50,51ur observations support a more detailed examination of these repressive TFs in the occlusion of accessible chromatin sites in B-ALL.
Supporting the utility of chromatin accessibility in B-ALL classification, subtype-enriched DASs predicted subtypes with 89% accuracy.As a comparison to chromatin accessibility, transcriptional profiling using ALLSorts correctly assigned B-ALL subtypes with 92% accuracy. 28However, this RNA-sequencing (RNA-seq) dataset included over 1,223 transcriptomes from 18 subtypes, representing a considerably larger dataset for model development.We therefore suspect that additional chromatinaccessibility profiling across more B-ALL subtypes and increased sample sizes will lead to even better subtype prediction that will rival transcriptomic profiling and, importantly, incorporate intergenic heterogeneity that can elucidate cis-regulatory drivers of B-ALL leukemogenesis.
To identify the role of inherited DNA-sequence variation on the B-ALL chromatin landscape, we mapped over 9,000 ATAC-QTLs (FDR < 0.1).A large subset of ATAC-QTLs mapped to TF footprints and was concordant in allelic biases with GTEx lead eQTLs.6][37][38] In addition, we identified ZNF263 as a top hit, which is consistent with our demonstration of enrichment for this transcriptional repressor at subtype-depleted DASs.Further validating our analysis, we functionally validated a variant (rs17481869; 2p22.3)associated with susceptibility to ALL. 33 Collectively, this analysis suggests that chromatin accessibility is additionally modified by inherited DNA-sequence variation, thereby further contributing to increased chromatin heterogeneity in B-ALL.
Overall, our data support pronounced changes in chromatin accessibility between B-ALL and precursor B cells as well as among B-ALL subtypes.Our results further support the role of diverse TFs and inherited genetic variants in modulating and promoting differences in chromatin accessibility among B-ALL subtypes.Ultimately, these diverse chromatin architectures contribute to unique gene-regulatory networks and transcriptional programs.Our work therefore provides a valuable resource to the cancer genomics research community and can be further used to better understand biological as well as clinical differences among B-ALL subtypes.

Limitations of the study
Our study provides a valuable resource for the genomics and cancer research communities.Because this work represents a resource, a key limitation is the lack of mechanistic insights with extensive experimental validation.Indeed, many of our analyses, such as comparisons with B-ALL cell lines and B-ALL RNA-seq datasets, were performed to validate the accuracy and robustness of our ATAC-seq data.Our identification of Pro-B cells as the most common cell of origin was identified in mouse models, 52 and a role for AP-1 TFs in B-ALL was also described in KMT2A-rearranged ALL. 53We also identified known B-ALL molecular drivers (e.g., DUX4 and ZNF384 in DUX4-rearranged and ZNF384-rearranged ALL, respectively) and previously documented gene-regulatory alterations in B-ALL (e.g., HOXA9 and MEIS1 activity in KMT2A-rearranged ALL).Collectively, these data support the validity of our ATACseq dataset.We further mapped key TF-target gene interactions that are enriched in B-ALL compared to progenitor B cells.Additional work, including the generation of H3K27ac ChIP-seq and promoter capture Hi-C data in Pro-B cells and other B cell progenitors, is required to establish the extent of regulatory rewiring in B-ALL by these TFs.We also identified multiple additional TFs, including transcriptional repressors, in driving unique chromatin architectures among B-ALL subtypes.Additional experimentation is likewise required to validate the roles for these TFs in maintaining or establishing chromatin architecture and in subtype biology.Our classification model was able to predict molecular subtypes with high accuracy.We believe that additional ATAC-seq datasets, including data from rarer subtypes not interrogated in this study, will improve the accuracy of chromatin accessibility in discriminating B-ALL molecular subtypes.Further functional experimentation is also required to validate the effects of ATAC-QTLs on TF-binding events and neighboring gene expression and to determine whether these inherited genetic variants are associated with additional B-ALL cellular phenotypes or even clinical phenotypes in patients.

RESOURCE AVAILABILITY
Lead contact Further information and requests for resources, including associating patient ATAC-seq data with other functional genomic data (e.g., RNA-seq) from the same patient biospecimens that are available on St. Jude Cloud (https://www.stjude.cloud/),should be directed to and will be fulfilled by the lead contact, Daniel Savic (daniel.savic@stjude.org)

Materials availability
Cell lines and plasmids generated in this study are available upon request from the lead contact.
Data and code availability ATAC-seq, H3K27ac ChIP-seq, CUT&RUN-seq, and promoter capture Hi-C from patient or cell line origin biospecimens have been deposited to NCBI Gene Expression Omnibus (GEO: GSE211631).Key code used in the analysis of data is available at: https:// github.com/Savic-Lab/B-ALL_Chromatin_Landscapeand https://doi.org/10.5281/zenodo.10018584.In the event of lead contact unavailability, guidance for acquiring additional data associated with patient samples can be requested from Kristine Crews (kristine.crews@stjude.org).S1.All patients or their legal guardians provided written informed consent.The use of these samples was approved by the institutional review board at St. Jude Children's Research Hospital.Sample size estimation was not performed prior to analyses.Patient samples were allocated to experimental groups (general B-ALL or B-ALL subtypes) based upon a combination of immunophenotype, cytogenetic and RNA transcript profiling.Detailed demographics of human patient samples were not available to incorporate into analyses.Differential chromatin accessibility loci localized to sex chromosomes should be utilized with caution.

EXPERIMENTAL MODEL AND STUDY
Cell lines ALL cell lines utilized in this study (SUPB15, 697, BALL1, SEM, REH, Nalm6, RS411, JIH5) were cultured in RPMI 1640 medium (GibCo 2492873) supplemented with 1% GlutaMAX (GibCo 35050061) 10% FBS and maintained at a target cell density in the range of 1 -3x10 6 cells/mL.JIH5 cells were cultured in the same medium containing 20% FBS.Cell lines were authenticated with PowerPlex Fusion STR (Promega) profiling and screened for mycoplasma using the MycoScope PCR detection kit (Genlantis).

Figure 1 .
Figure 1.Chromatin-accessibility landscapes in B-ALL (A) Number and genomic location of accessible chromatin sites for ten B-ALL subtypes and B-other samples is provided.(B) Percentage of B-ALL accessible chromatin sties that map to H3K4me1 and/or H3K27ac active histone marks (active; green), H3K27me3 and H3K4me1 and/or H3K27ac bivalent or poised histone marks (bivalent or poised; yellow), and H3K27me3 only repressed histone marks (repressed; red).

Figure 2 .Figure 3 .
Figure 2. B-ALL cell type of origin defined by chromatin accessibility (A) Differentiation timeline of B cell progenitors from least differentiated to most differentiated.HSC, hematopoietic stem cell; MPP, multipotent progenitor cell; LMPP, lymphoid-primed multipotent progenitor cell; CLP, common lymphoid progenitor cell; PreProB, prePro-B cell; ProB, Pro-B cell; CD19 + CD20 + , B cell.(B) Heatmap using row-wise hierarchical clustering of B cell progenitor or B-ALL patient sample variance-stabilized ATAC-seq signal from DESeq2 across B cell progenitor-defining chromatin loci.B cell progenitor groups most similar to B-ALL patient samples (preProB and ProB) are outlined in yellow.(C) Confusion matrix showing number (listed) and percentage (color coded) of B cell progenitor truths and predictions for leave-one-out cross-validation of a k-nearest-neighbor classifier model.(D) Distribution of B cell progenitor classification across B-ALL patient samples using a k-nearest-neighbor classifier model trained with B cell progenitor data.

Figure 4 .
Figure 4. Mapping differential accessibility among B-ALL molecular subtypes (A) Heatmap of variance-stabilized ATAC-seq signal as Z score across subtype-enriched DASs.Enrichment patterns for each subtype DAS set are shown on vertical axis and are grouped by B-ALL subtype patient sample on the horizontal axis.Ph-like and BCR-ABL subtype-enriched DASs are expanded on the right for clarity.(B) Pie chart shows the number and percentage of subtype-enriched DASs identified.(C) ATAC-seq signal-track examples of subtype-enriched DASs on the UCSC Genome Browser.(D) dCas9-KRAB repressor targeting schematic (left) and relative transcript levels for genes associated with subtype-enriched DASs in B-ALL cell lines (right).Seventy-two hours after doxycycline (100 ng/mL) induction of SEM (KMT2A-rearranged) and NALM6 (DUX4-rearranged), B-ALL cell lines expressing doxycycline-inducible dCas9-KRAB and transduced with negative control single-guide RNAs (sgRNAs) (non-coding and non-targeting) or sgRNAs targeting subtypeenriched DASs (Enh1 and Enh2) are shown.Gene expression is normalized to the average of the two control sgRNAs (error bars denote ±standard error of the mean).Significance was calculated by a two-sample t test of combined biological replicates for both control sgRNAs versus both DAS-targeting sgRNAs.**p < 0.01; ***p < 0.001.

Figure 5 .Figure 6 .Figure 7 .
Figure 5. TF footprinting and gene-regulatory networks identify key TF drivers in B-ALL subtypes (A) Heatmap list of the topmost consistently differential TF footprints between all pairwise subtype-subtype comparisons (y axis; labeled to the right of the heatmap as TF motif identifiers) enriched in ten B-ALL subtypes (x axis; labeled on top of heatmap as Z score of differential TF footprint signal output by TOBIAS).(B) RNA-seq transcripts per million (TPM) expression of key TFs with subtype-enriched footprints that are also upregulated in the corresponding subtype (colored) versus all other subtypes (gray).DESeq2 differentially expressed gene FDR significance values are provided.(C) Top TF footprints at DASs that are enriched (top) or depleted (bottom) in KMT2A-rearranged (left) and DUX4-rearranged (right) B-ALL.Differential footprint score between merged subtype patient samples and non-subtype patient samples is provided on the x axis, and TF footprint significance is provided on the y axis.Higher differential footprint scores indicate higher binding in the merged subtype group compared to all other merged non-subtype samples.TPM transcript abundance of associated TF transcript in the merged subtype group is shown as both color and size of points.
PARTICIPANT DETAILS Human subjects Patient samples were obtained from: St. Jude Children's Research Hospital (Memphis, Tennessee), ECOG-ACRIN Cancer Research Group, The Alliance for Clinical Trials in Oncology, MD Anderson Cancer Center (Houston, Texas), Cook Children's Medical Center (Fort Worth, Texas), Lucile Packard Children's Hospital (Palo Alto, California), The University of Chicago (Chicago, Illinois), Novant Health Hemby Children's Hospital (Charlotte, North Carolina) and Children's Hospital of Michigan (Detroit, Michigan).A list of patient biospecimens is provided in Table Detailed methods are provided in the online version of this paper and include the following: