Repurposing of KLF5 activates a cell cycle signature during the progression from a precursor state to oesophageal adenocarcinoma

Oesophageal adenocarcinoma (OAC) is one of the most common causes of cancer deaths. Barrett’s oesophagus (BO) is the only known precancerous precursor to OAC, but our understanding about the molecular events leading to OAC development is limited. Here, we have integrated gene expression and chromatin accessibility profiles of human biopsies and identified a strong cell cycle gene expression signature in OAC compared to BO. Through analysing associated chromatin accessibility changes, we have implicated the transcription factor KLF5 in the transition from BO to OAC. Importantly, we show that KLF5 expression is unchanged during this transition, but instead, KLF5 is redistributed across chromatin to directly regulate cell cycle genes specifically in OAC cells. This new KLF5 target gene programme has potential prognostic significance as high levels correlate with poorer patient survival. Thus, the repurposing of KLF5 for novel regulatory activity in OAC provides new insights into the mechanisms behind disease progression.


Introduction
Oesophageal cancer is the eighth most common cancer worldwide, and its 5-year survival rate of 15% makes it the sixth most-common cause of cancer-related death (Ferlay et al., 2015;Pennathur et al., 2013). A subtype of oesophageal cancer, oesophageal adenocarcinoma (OAC), is the predominant subtype in many Western countries and its incidence is rising rapidly (Coleman et al., 2018). Patients with OAC often present at a late stage with advanced disease (Smyth et al., 2017). The lack of molecular knowledge of OAC, combined with lack of tailored therapies, contribute to the low survival of OAC patients.
The accepted model of OAC development is the progression from an intestinal metaplastic condition of the lower oesophagus, known as Barrett's oesophagus (BO), to OAC through increasing stages of dysplasia (Burke and Tosh, 2012;Spechler and Souza, 2014). Many mutations found in OAC are also present in BO, especially TP53, which suggests a stepwise transition to OAC (Ross-Innes et al., 2015;Stachler et al., 2015). Focal amplifications differ as they largely occur in OAC compared to BO (Lin et al., 2012;Stachler et al., 2015;Yamamoto et al., 2016). The amplified genes can be grouped into functional biological pathways with the RAS-ERK signalling pathway (e.g. ERBB2; EGFR; KRAS) and GATA transcription factors (GATA4; GATA6) being the most common (Frankell et al., 2019;Lin et al., 2012;Cancer Genome Atlas Research Network et al., 2017). The morphology of BO differs from the oesophageal epithelia by the presence of a columnar epithelium and secretory goblet cells, rather than squamous epithelium (reviewed in Spechler and Souza, 2014). Genomic and transcription events have been observed to differ between BO and OAC. Mutations in TP53 are more frequent in BO from patients that had progressed to OAC (Stachler et al., 2018) and SMAD4 mutations appear to occur exclusively in OAC, although at a low frequency (Weaver et al., 2014). Increased TGFb signalling through other SMAD family members, SMAD2/3, promotes growth in OAC cells (Blum et al., 2019). Additionally, increased expression and increased activity of AP-1 transcription factors occurs in the transition from BO to OAC (Blum et al., 2019;Britton et al., 2017;Maag et al., 2017). Despite these studies, the definitive molecular mechanisms of progression to OAC are poorly understood and biomarkers to identify patients at risk of progression are lacking.
Changes to the chromatin landscape have been implicated in many cancers and chromatin accessibility changes during tumourigenesis are a major factor in altering regulatory element activity (Britton et al., 2017;Corces et al., 2018;Davie et al., 2015;Denny et al., 2016;Kelso et al., 2017;Rendeiro et al., 2016;Tome-Garcia et al., 2018;Zhou and Guo, 2018). We recently used Assay for Transposase-Accessible Chromatin using sequencing (ATAC-seq) to ascertain the molecular basis of BO and identified a set of transcription factors that define the BO chromatin landscape and are retained in OAC (Rogerson et al., 2019). Here, we took a similar approach to discover important transcriptional regulators ( Figure 1A) that are specifically operational in OAC and hence contribute to the molecular basis of disease progression from BO to OAC. We compared the open chromatin landscape in BO and OAC patient biopsies and uncovered KLF5 as an important transcriptional regulator that is repurposed to directly drive a cell cycle gene expression signature during the progression of BO to OAC.

Enhanced cell cycle activity defines BO progression to OAC
To begin to understand the molecular events that distinguish OAC form the BO precursor state we first established the differential gene expression profiles between BO and OAC. We analysed public human BO and OAC RNA-seq data (Maag et al., 2017). These samples separate well after principal component analysis (PCA), therefore we retained all samples for further analysis (Figure 1-figure supplement 1A). Performing differential gene expression analysis, we identified 905 differentially expressed genes between BO and OAC (±1.5 x; Q-value <0.05; Figure 1B; Supplementary file 1). eLife digest Acid fluids present in the gut can sometimes 'go up' and damage the oesophagus, the pipe that connects the mouth and the stomach. As a result, a small number of individuals can develop Barrett's oesophagus, a condition where cells in the lining of the lower oesophagus show abnormal shapes. In certain patients, these cells then become cancerous, but exactly how this happens is unknown. This lack of understanding contributes to late diagnoses, limited treatment and low survival rates.
Many cancers feature 'signature' mutations in a set of genes that controls how a cell can multiply. Yet, in the case of cancers of the lower oesophagus, known genetic changes have had a limited impact on our understanding of the emergence of the disease. Here, Rogerson et al. focused instead on non-genetic changes and studied transcription factors, the proteins that bind to regulatory regions of the DNA to switch genes on and off.
A close inspection of cancer cells in the lower oesophagus revealed that, in that state, a transcription factor called KLF5 controls the abnormal activation of genes involved in cell growth. This is linked to the transcription factor adopting a different pattern of binding onto regulatory regions in diseased cells. Crucially, when the cell growth genes regulated by KLF5 are activated, patients have lower survival rates. Further work is now required to examine whether this finding could help to identify patients who are most at risk from developing cancer. More broadly, the results from the work by Rogerson et al. demonstrate how transcription factors can be repurposed in a disease context.
Of these 905 genes, 465 are upregulated in OAC and 440 are downregulated in OAC compared to BO. To validate these findings, we analysed RNA-seq data from our own sample collection (3 BO and 3 OAC). Genes that were upregulated in OAC from the discovery dataset were significantly upregulated in the validation dataset and likewise for downregulated in OAC genes ( Figure 1C). To gain insights into biological pathways behind these differentially expressed genes, we used two approaches. Firstly, Gene Set Enrichment Analysis (GSEA) uncovered two cell cycle associated terms, 'G2M checkpoint' and 'E2F1 targets', as the most significant upregulated gene sets in OAC ( Figure 1D). Conversely, 'Fatty acid metabolism' and 'p53 pathway' are the most significant downregulated gene sets (Figure 1-figure supplement 1B). Secondly, biological pathway gene ontology analysis of upregulated genes revealed many cell cycle associated terms, such as 'Nuclear division', 'Regulation of mitotic cell cycle' and 'DNA replication' ( Figure 1E). Example genes such as CDC25B, CENPI and E2F1 all showed significant upregulation in OAC compared to BO in both datasets (Figure 1-figure supplement 1D). Downregulated genes uncovered metabolic associated terms, such as 'alcohol metabolic process', 'monocarboxylic acid metabolic process' and 'Lipid catabolic process' (Figure 1-figure supplement 1C). Representative example genes from these pathways such as IDI1, ADH4 and CIDEC all show significant downregulation in both datasets ( x, Q-value <0.05) expressed genes between human Barrett's oesophagus n = 13 and human OAC n = 12 samples (Maag et al., 2017). (C) Violin plots of expression of differentially expressed genes between Barrett's oesophagus (n = 13) and oesophageal adenocarcinoma (n = 12) from discovery dataset (D; Maag et al., 2017) and validation dataset (V; BO = 3; OAC n = 4). Genesets are shown for upregulated (left) and downregulated (right) in OAC. (D) Gene set enrichment analysis of differentially expressed genes. The top two upregulated gene sets are shown with normalised enrichment score (NES) and Q-value. (E) Biological pathway GO term analysis of upregulated genes. The top 10 terms are shown. See also cycle processes during the progression from BO to OAC accompanied with the inactivation of genes controlled by the p53 pathway and genes associated with metabolism.

Chromatin accessibility changes in the transition from BO to OAC
To identify putative transcriptional regulators that may drive the transition to OAC and impact on this enhanced cell cycle profile, we analysed the accessible chromatin landscape using ATAC-seq from patient biopsies. To supplement our previous ATAC-seq datasets from BO and OAC patients (Britton et al., 2017;Rogerson et al., 2019), we performed ATAC-seq on two additional OAC biopsies, which were quality-checked and reproducible (Figure 2-figure supplement 1A and B). We wanted to focus on the differentially expressed genes in OAC compared to BO, therefore we generated a set of accessible regions representing potential regulatory regions that are associated with this set of genes. We took all ATAC-seq peaks from all samples within +/-250 kb of a TSS of a differentially expressed gene ( Figure 2A). After merging so that only unique peaks remained, 35,220 regions were used for further analyses (Supplementary file 2). We first performed principal component analysis on normalised ATAC-seq signal of all BO and OAC samples to identify differences between samples ( Figure 2B). This led to clustering of all BO samples and clustering of most OAC samples. OAC samples T_003 and T_005 did not cluster with the other OAC samples and were therefore removed from the subsequent differential accessibility analysis. We then carried out differential accessibility analysis between BO and OAC on this peak set ( Figure 2C; Supplementary file 2). A total of 1495 regions were significantly differentially accessible (±2 x; Q-value <0.1), the majority of which increased in accessibility (1327/1495). An example gene locus which shows differential accessibility in OAC is centred on KRT19 ( Figure 2D). Within this locus, both gene promoters (14%; 1/7) and distal regulatory regions (86%; 6/7) gain accessibility in OAC. To assess whether the observed changes of accessibility near differentially expressed genes are common to other OACs, we compared our ATAC-seq data to independent, previously published ATAC-seq datasets from TCGA-ESCA oesophageal adenocarcinoma samples ( Figure 2D, bottom; Figure 2E; Corces et al., 2018). TCGA-ESCA samples showed similar open chromatin peak profiles and clustered with our OAC samples with the exception of one sample, which clusters with our BO samples ( Figure 2E). The chromatin accessibility profiles nearby genes differentially expressed in OAC are therefore reproducible across patients.
Next, we harnessed the differential accessibility data to uncover the identities of transcription factors bound in these regions. De novo motif discovery of regions that become more accessible in OAC contain significantly enriched motifs for AP-1, KLF, TBX, NFkB and p53 transcription factor families ( Figure 3A; Supplementary file 3). AP-1 and KLF were clearly the most frequent motifs in the differential regions and showed the strongest match score for the consensus motif. Regions that showed decreased chromatin accessibility in OAC are enriched in EWSR1-FLI1, ASCL2, GLI2, E2F and ZBTB18 motifs, albeit with relatively low match scores (Figure 2-figure supplement 1C; Supplementary file 3). To further assess which transcription factors might be involved in gene expression control, we carried out footprinting analysis on differential accessible regions from our ATAC-seq datasets ( Figure 3B; Bentsen et al., 2020). In differential accessible regions, motifs for KLF (e.g. KLF4, KLF5 and KLF1) and AP-1 (e.g. FOS, JUNB, JUND, JUN and FOSL1/2) transcription factors showed the highest footprinting score in OAC, whereas motifs for homeobox transcription factors (e.g. HNF1A, HOXA5 and NKX2-5), ARID3A and MEF transcription factors (e.g. MEF2A and MEF2C) showed more footprinting in BO. To provide more evidence for transcription factor occupancy, we then plotted ATAC-seq signal across their motifs. Both FOS (AP-1) and KLF4 (KLF) motifs show a clear increase in footprint depth in OAC, indicative of more transcription factor binding ( Figure 3C). We have previously identified AP-1 as an important regulator in OAC (Britton et al., 2017), but the role of KLF transcription factors in OAC is poorly understood. We therefore focussed on the potential role of KLF transcription factors in the progression of BO to OAC.

KLF5 controls expression of cell cycle genes in OAC
To identify a specific KLF transcription factor that may be bound to these accessible regions, we analysed the expression of individual KLF transcription factors in OAC samples ( Figure 3D). KLF5 was clearly the highest expressed among the KLF family in OAC. KLF5 has been previously implicated in oesophageal squamous cell carcinoma as a tumour suppressor (Tarapore et al., 2013) and has been identified as pro-tumorigenic in gastric cancer via amplifications (Chia et al., 2015). To determine the gene regulatory functions of KLF5, we carried out siRNA-mediated knockdowns of KLF5 in OE19 cells, a cell line we identified as having a similar chromatin landscape to OAC biopsies (Rogerson et al., 2019) and exhibits strong tumourigenic properties (Hassan et al., 2017). Knockdown of KLF5 was evident after 3 days siRNA transfection (    Supplementary file 4). Biological pathway GO term analysis revealed several enriched terms including 'DNA replication' and 'Regulation of mitotic cell cycle' for downregulated genes, and terms involving 'Oxidative phosphorylation' and 'mitochondrial gene expression' for upregulated genes (Figure 3-figure supplement 1D and E). The terms associated with downregulated genes are reminiscent of the terms enriched in genes upregulated in OAC (see Figure 1). Moreover, GSEA also found similar gene sets: 'mitotic spindle'; 'G2M checkpoint' and 'E2F targets' for downregulated genes and 'oxidative phosphorylation'; 'xenobiotic metabolism' and 'fatty acid metabolism' for upregulated genes ( Figure 3-figure supplement 1F and G). Since the genes regulated by KLF5 are involved in similar processes as the genes aberrantly expressed in OAC, we asked whether any of the same genes are in each dataset. 21% (97/465) of the genes upregulated in OAC significantly overlap with those downregulated with siKLF5 ( Figure 3E) and many of these are associated with cell cycle related functions, including genes encoding core cell cycle proteins like CCNE1, E2F1 and various MCM proteins ( Figure 3-figure supplement 1H). Further analysis of the biological pathways enriched within these 97 genes identified very similar GO terms to those enriched in genes upregulated in OAC compared to BO ( Figure 3F). GSEA also identified the same gene set terms: 'G2M checkpoint' and 'E2F1 targets' ( Figure 3G). Next, we asked whether these genes are directly regulated by KLF5, and carried out replicate ChIP-seq for KLF5 in OE19 cells which were highly correlated ( Focussing on the 97 genes that are upregulated in OAC and also downregulated after KLF5 depletion, 97% have a KLF5 ChIP-seq peak within 0.5 Mb of the TSS and the median distance between a KLF5 ChIP-seq peak and the TSS of all significantly downregulated genes was 11,975 bp (Figure 3-figure supplement 2E). In contrast, KLF5-binding regions are further away (>20 kb) from the TSS of genes that were either unaffected by KLF5 depletion or whose expression was increased. This is indicative of direct activation by KLF5. An example gene is CDC25B which harbours multiple KLF5 ChIP-seq peaks surrounding its locus ( Figure 3H). Collectively, these results suggest a direct activator role of KLF5 in controlling cell cycle genes in OAC.

The KLF5 cistrome is reconfigured during the progression from BO to OAC
Having determined a role for KLF5 in controlling cell-cycle-associated gene expression in OAC cells, we sought to determine the mechanism through which KLF5 acquires these functions. We first asked whether KLF5 expression changes in the transition from BO to OAC, however no increase in expression was found ( Figure 4A). An alternative mechanism might be through redistributing the binding of KLF5 to different regulatory elements in OAC. We therefore hypothesised that KLF5 is active in both BO and OAC but may regulate specific genes in OAC by binding at different loci.
CP-A cells are derived from non-dysplastic BO and do not exhibit strong tumourigenic properties (Lin et al., 2012) so we compared KLF5 expression in BO-derived CP-A and OAC-derived OE19 cells and found that KLF5 is expressed at similar levels in (Figure 4-figure supplement 1A). We therefore we used these cell-lines to model KLF5 activity in BO and OAC. To gain a more comprehensive view of KLF5 function, we performed ChIP-seq for KLF5 in CP-A cells and used spike-in normalisation to better assess differential binding relative to OE19 cells. Anti-KLF5 antibodies precipitated KLF5 in CP-A cells (  We next asked whether the regions that exhibit differential binding are enriched for specific transcription factor motifs. Regions that are bound by KLF5 in OE19 cells are enriched in motifs for KLF, GATA, Forkhead, AP-1 and TCF transcription factors, whereas regions bound by KLF5 in CP-A cells are enriched for a different set of motifs with, TEAD, RUNX and p53 transcription factor families in addition to KLF and AP-1 motifs detected ( Figure 4E; Supplementary file 6). These results are inkeeping with our previous work showing AP-1 and GATA6 functionality in OAC (Britton et al., 2017;Rogerson et al., 2019). Regions specifically bound by KLF5 in OAC cells also exhibited increased accessibility in OE19 cells and importantly, accessibility is also elevated around these binding sites in OAC tissue (Figure 4-figure supplement 2A). These findings are therefore consistent with a broad role of KLF5 in OAC.
To further probe the potential biological significance of the differentially bound KLF5 regions, we associated these with the nearest gene and determined the enriched GO terms for genes associated with cell-type-specific KLF5 peaks that also show preferential expression in BO or OAC. OE19-specific KLF5-binding events are associated with genes involved in 'cell division' control, whereas CP-Aspecific KLF5 binding is associated with 'epithelial cell differentiation' (Figure 4-figure supplement 2B). The latter observation is consistent with the potential loss of cell identity in OAC. However, since oncogenic events during the progression from BO to OAC are poorly understood, we decided to focus on regions that acquire KLF5 binding in OE19 cells. To relate specific KLF5-binding events to gene expression changes, we took the set of 97 genes that are upregulated in OAC and downregulated by KLF5 depletion (i.e. activated by KLF5; Figure 3E) and found that there are 371 OE19specific KLF5-binding peaks within a 0.5 Mb locus centred on the TSS. To further explore how KLF5 activates these genes in OAC, we assessed the transcription factor binding motif distribution (identified in Figure 4E) within this set of OE19-specific KLF5 peaks. We detected KLF5 binding motifs in 257/371 of these regions, and strikingly, 56% (145/257) of the peaks also house a mixture of FOXA, AP-1, GATA and TCF motifs, in addition to the KLF motif ( Figure 4F; Figure 4-figure supplement 2C; Supplementary file 7). This suggests that KLF5 functions in a combinatorial manner with these other transcription factors to activate gene transcription during progression from BO to OAC. However, a large portion of these peaks (44%) contain only a KLF motif suggesting a more independent role for KLF5 in these regions ( Figure 4F). To test whether these motif enrichments reflect transcription factor binding, we integrated our ChIP-seq data of transcription factors active in OAC (GATA6 and HNF4A; Rogerson et al., 2019), with KLF5-binding data. Since only GATA motifs are enriched in these regions we would expect co-binding with GATA6 and not HNF4A. We therefore compared ChIP-seq profiles for these transcription factors, and see extensive co-binding of GATA6 at these sites but no evidence of co-binding with HNF4A ( Figure 4-figure supplement 2D). Finally, the predicted target gene co-regulation by KLF5 and GATA6 was validated by depletion of each factor in OE19 cells, which leads to a large significant overlap in downregulated genes ( Figure 4G). However, this co-regulated gene set contains only two of the cell cycle associated genes regulated by KLF5,  suggesting that this combination of transcription factors is not directly involved in controlling this process. Turning back to the cell cycle genes directly activated in OAC through OAC-specific binding of KLF5, we tested whether KLF5 is relevant for their expression in CP-A cells. As expected from the lower KLF5-binding levels in these cells, depletion of KLF5 had little effect on these genes (Figure 4-figure supplement 2E), consistent with a newly acquired function in OAC cells.
To establish whether the 371 KLF5-bound regions that are associated with KLF sensitive genes are relevant to OAC, we turned back to our ATAC-seq data and clustered the data to reveal two clusters. One set of regions is already partially open in CP-A cells that increase in accessibility in OE19 cells (cluster 1) and another set are closed in CP-A cells and become more accessible (cluster 2) ( Figure 4H, Figure 4-figure supplement 2F, left). Importantly, the same pattern of accessibility is evident using ATAC-seq signal from BO and OAC tissue ( Figure 4H, Figure 4-figure supplement 2F, right). To identify any potential differences between these clusters, we performed motif analysis (Figure 4-figure supplement 2G; Supplementary file 8). The most common motif in both clusters were KLF motifs and the most striking difference is the large proportion of AP1 motifs specifically associated with cluster one suggesting a potential role for AP1 in priming binding of KLF5 to these regions.
Together, these results indicate an altered DNA-binding profile for KLF5 in BO and OAC, and this altered binding is associated with chromatin opening. This altered binding profile for KLF5 in OAC reflects a direct role in controlling genes involved in cell cycle.

KLF5 converges with ERBB2 on cell cycle gene regulation and controls cell proliferation in OAC
Our results indicate a role of KLF5 in controlling increased cell cycle gene expression in OAC; however, it is unclear how this relates to genetic events that potentially impact on the same process. Genomic amplifications in signalling receptors are common in OAC, such as ERBB2 (32% OAC have an ERBB2 amplification; Cancer Genome Atlas Research Network et al., 2017) and occur during the transition from BO to OAC (Stachler et al., 2015). As the ERK pathway is implicated in promoting cell proliferation and is controlled by ERBB2, we investigated whether ERBB2 signalling impacts on KLF5-mediated gene regulatory events. First, we sought evidence for a link with transcription factor activity, and performed ATAC-seq on OE19 cells to investigate whether depletion of ERBB2 could alter chromatin accessibility. OE19 cells contain an amplification of the ERBB2 locus (Dahlberg et al., 2004) and are dependent on ERBB2 for their proliferation (Hong et al., 2012). ERBB2 levels were efficiently reduced after 72 hr of siRNA treatment and phosphorylation of downstream targets (ERK and AKT) was reduced ( Figure 5-figure supplement 1A). ATAC-seq data were reproducible and good quality ( Figure 5-figure supplement 1B and C). We performed differential accessibility analysis, which identified 717 regions with decreased chromatin accessibility and 733 regions with increased accessibility ( Figure 5A; Supplementary file 9). De novo motif analysis of the regions that exhibit reduced chromatin accessibility following ERBB2 depletion, revealed that the majority contain AP-1-binding motifs as expected from the established connections between ERK pathway signalling and AP1 transcription factors. However, the binding motif for KLF transcription factors was also detected, albeit in a subset of the regions ( Figure 5B; Supplementary file 10). We then used our KLF5 ChIP-seq dataset from OE19 cells to validate KLF5 binding at regions with reduced chromatin accessibility following ERBB2 depletion ( Figure 5-figure supplement 1D). These regions are relevant in the context of OAC as they also show increased chromatin accessibility in OAC tissue compared to BO ( Figure 5C). The convergence of ERBB2 signalling on KLF5 transcription factor activity suggested that they might also converge on the same genes. We therefore carried out RNA-seq in OE19 cells treated with siRNA against ERBB2. The RNA-seq data were highly reproducible ( Figure 5-figure supplement 1E) and resulted in 778 genes down-and 664 genes up-regulated (two-fold change; FDR < 0.05, FPKM > 1) ( Figure 5-figure supplement 1F). There is a large, statistically significant overlap between directly activated KLF5 target genes and genes downregulated by ERBB2 depletion. Moreover, a closer comparison reveals that the expression of the majority of the directly activated KLF5 target genes was reduced upon ERBB2 knockdown ( Figure 5D). Most of these common target genes are cell cycle related. These results therefore indicate that ERBB2 and KLF5 converge on a similar set of regulatory regions to drive the expression of cell cycle regulatory genes. To establish whether ERBB2 can redistribute KLF5 binding and activate its target genes, we created BO-derived CP-A cell lines that stably over express ERBB2 to mimic the     effect of amplification seen in OAC. These cells exhibit high levels of ERBB2 expression, maintain ERK and AKT activation in serum starved conditions ( Figure 5-figure supplement 2A), and exhibit growth factor-independent proliferation ( Figure 5-figure supplement 2B). Several cell cycle related genes that are activated by KLF5 in OAC cells are also activated by ERBB2 overexpression in BO cells ( Figure 5-figure supplement 2C). However, we were unable to detect any increases in KLF5 occupancy at a panel of KLF5-binding regions associated with cell cycle genes ( Figure 5-figure supplement 2D). These findings therefore reaffirm the convergence of ERBB2 signalling and KLF5 on the activation of a cell cycle gene signature but ERBB2 is not sufficient to trigger KLF5 redistribution. Finally, we assessed whether defective KLF5-driven cell cycle gene regulation led to proliferative defects in OAC cells. We first depleted KLF5 in OE19 cells using siRNA which resulted in the reduction of KLF5 protein ( Figure 5-figure supplement 3A), and the growth of cells was significantly impeded after siKLF5 treatment ( Figure 5E). Second, we validated this growth defect by using CRISPR interference technology. Stable transfection of dCas9-KRAB and subsequent transfection of sgRNAs targeting the promoter of KLF5 (sgKLF5) into OE19 cells resulted in the reduction of KLF5 protein levels ( Figure 5-figure supplement 3B). CRISPRi knockdown of KLF5 also significantly reduced the growth of OE19 cells ( Figure 5F), mirroring the result with siKLF5. We further explored the role of KLF5 in cell growth and cell cycle progression by performing similar assays while perturbing KLF5 target genes (CCNE1, CDC25B, KIF14, CLSPN and NR4A1). All these genes showed significant reductions in expression upon siRNA treatment ( Figure 5-figure supplement 3C). The growth of OE19 cells was significantly reduced with the treatment of siRNA against CCNE1, KIF14 and CLSPN ( Figure 5G). Knockdown of these genes also significantly altered cell cycle patterns, particularly knockdown of CLSPN which induced a prominent S-phase block ( Figure 5-figure supplement  3D). These results provide more evidence for the role of KLF5 in the growth of cells and highlight the role of KLF5 target genes in this phenotype.
To assess whether the expression of KLF5 and its target genes has any clinical relevance, we sourced OAC expression and survival data (Cancer Genome Atlas Research Network et al., 2017) and plotted a survival of patients with high and low expression (±median) of KLF5 itself and KLF5 target genes up to 24 months ( Figure 5H). Those with a higher expression of KLF5 showed no difference in patient survival, whereas patients with high target gene expression exhibited a significantly lower survival rate compared to those with low expression. This result is in keeping with the hypothesis that it is the activation of KLF5 target genes by its redistribution across chromatin, rather than its expression level that is important. It is noteworthy that CLSPN expression alone is predictive of increased patient survival ( Figure 5H) and its enhanced expression in OAC compared to BO makes this a useful potential biomarker ( Figure 5I).
Collectively, these results confirm the functional role of KLF5 in cell cycle control in OAC and convergence of action with the ERBB2 signalling pathway. This is clinically important as patients with highly expressed KLF5 target genes have a worse prognosis that those without.

Discussion
Genome sequencing efforts of patients with BO and OAC have provided insights into the molecular causes of BO and OAC and show the mutational relationships between these disease states (Ross-Innes et al., 2015;Stachler et al., 2015). This has provided evidence for a model of OAC developing from BO. The molecular mechanisms involved in progression to OAC are poorly understood; however, BO offers a therapeutic window of opportunity to identify those more at risk of OAC development. In addition to genetic events, epigenetic changes and alterations to the chromatin landscape are also likely to play an important role in disease progression. Here, we demonstrate that there are marked changed in chromatin accessibility and associated gene expression, indicating active changes at the chromatin level during carcinogenesis. One of the major contributing factors to this change is the transcription factor KLF5. KLF5 is re-purposed in OAC cells and its chromatinbinding profile is massively rewired to drive increased expression of cell cycle associated genes ( Figure 5J). Conversely, this rewiring results in the loss of KLF5 binding to many regulatory regions occupied in Barrett's cells. This loss is potentially associated with the loss of cell identity, and may also contribute to the development of the cancer phenotype.
Cell cycle deregulation is one of the key hallmarks of cancer (Hanahan and Weinberg, 2011) and here we uncovered a cell cycle gene expression signature, comprised of genes that are overexpressed in OAC. Recent research identified the cell-cycle as a perturbed pathway in OAC and suggested the possibility of CDK4/6 inhibitors as a therapeutic treatment (Frankell et al., 2019;Mourikis et al., 2019). We have previously uncovered a deregulated FOXM1 regulatory network active in OAC, a key regulator of late cell cycle gene expression (Wiseman et al., 2015). By integrating ATAC-seq data to identify upstream regulators of this signature, we also uncovered AP-1 and KLF5 as putative transcription factors in this process. We have previously identified AP-1 as an important factor in OAC (Britton et al., 2017) and others have shown an increase in AP-1 family transcription factors between non-dysplastic BO and low-grade dysplastic BO (Maag et al., 2017). What is less clear is the role of KLF5 in the progression of BO to OAC. KLF5 has been shown to have a tumour promoting function in pancreatic (He et al., 2018) and basal-like breast cancer . KLF5 is also frequently amplified in gastric cancer (Chia et al., 2015;Zhang et al., 2018) and has recently been shown to regulate gene expression in OAC in combination with other transcription factors, GATA6, ELF3 and EHF (Chen et al., 2020). This was reinforced by a recent study that identified KLF5 as a master transcription factor on which OAC cell-lines were dependent (Reddy et al., 2019). Paradoxically, KLF5 has been shown to have a tumour suppressor role in oesophageal squamous cell carcinoma (Tarapore et al., 2013) and breast cancer (Chen et al., 2002). The expression of the related protein, KLF4, together with three other genes, was able to stratify OAC from BO, albeit KLF4 expression is reduced in progression from BO to OAC (Maag et al., 2017).
Previous studies have begun to suggest a role for KLF5 in cell cycle control. For example, KLF5 binds to a CCNE1 promoter proximal element in bladder cancer cells (Pattison et al., 2016) and KLF5 increases the expression of Ccnb1 and Mcm2 downstream of oncogenic Ras in fibroblasts (Nandan et al., 2005). Here, we provide evidence that KLF5 exhibits a widespread role; directly controlling cell proliferation through activation of cell cycle associated genes. We also show that reduction of KLF5 levels, or several of its target genes, in OAC cells impairs growth. Indeed, this is exemplified by CLSPN which may have therapeutic potential as its gene product, Claspin, has recently been shown to have a broader role in cancer cell viability by protecting cancer cells from replication stress (Bianco et al., 2019). KLF5 directly binds and regulates core cell cycle genes for example CDC25B, CCNE1 and MCM2, some of which are cell cycle transcription factors for example E2F1, MYBL2, thus providing a mechanism for propagating its effects on cell cycle control. We also show KLF5 expression is almost unchanged between BO and OAC. By profiling KLF5 chromatin binding in BO and OAC cells, we have demonstrated an altered KLF5 binding profile. The regions bound by KLF5 specifically in OAC cells are enriched in motifs for several transcription factors, including the GATA family which suggests a combinatorial regulatory code. This is in keeping with our finding that there is extensive overlap between the binding of KLF5 and GATA6 which is reinforced by recent studies that show that KLF5 binds with GATA6 in OAC (Chen et al., 2020) and gastric cancer (Chia et al., 2015).
The overlap in regulatory potential with GATA6 provides a plausible link to one of the major genetic events that drive the BO to OAC transition. Our work also suggests a link to another major pathway that is activated through gene amplification in OAC, the ERBB2-driven RAS-ERK pathway. Knockdown of ERBB2 reduced the expression of many KLF5 target genes and KLF5 motifs were found at regions with reduced chromatin accessibility upon ERBB2 knockdown. However, ERBB2 overexpression in BO cells is insufficient to trigger KLF5 redistribution, indicating that other pathways contribute to KLF5 redistribution in OAC, but this needs further investigation. Nevertheless, it is clear that ERBB2 signalling and KLF5 activity converge on the same cell cycle genes and both are required for their activation, indicating functional synergy. The signalling pathways are more unclear in the context of BO, the precancerous precursor. We see enrichment of the TEAD motif only in CP-A cells and not OE19 cells, suggesting that KLF5 may be operating through the Hippo signalling pathway in BO. In other contexts, KLF5 has been shown to cooperate with TEAD transcription factors, downstream of YAP/TAZ  and KLF5 is stabilised by YAP in breast cancer cells (Zhi et al., 2012). Further work is needed to substantiate these links in BO.
In summary, we have used integrative analysis of RNA-seq and ATAC-seq from BO and OAC patient samples to uncover a cell cycle signature regulated by KLF5. Using a multi-omics approach, we found an oncogenic role of KLF5 in OAC, a transcription factor that has not been shown to be mutated, amplified and/or over-expressed in OAC. This study highlights the power of supplementing expression data with genome-wide chromatin profiling methods such as ATAC-seq. This provides molecular insights into the mechanisms by which BO progresses to OAC and identifies a signature of transcription factor gene targets that have potential prognostic significance and could be used as biomarkers in the clinic.
Fresh frozen OAC 2 mm biopsies were obtained by consenting patients undergoing endoscopy. Tissue collection was granted by the ethics committee of Salford Royal NHS Foundation Trust (04/ Q1410/57). Patient consent was obtained in written form and signed by the patient and doctor. Patient biological replicates are defined as separate patients, and cell-line biological replicates are defined as separate cell-lines cultures, processed at the same time.

RT-qPCR
RT-qPCR was carried out using QuantiTect SYBR Green RT-PCR Kit (Qiagen, 204243) using the primer pairs detailed in Supplementary file 11. Relative gene expression was calculated using the DDCT method relative to levels of GAPDH mRNA.
RNA extraction, RNA-seq processing and analysis RNA was extracted from cells using a RNeasy RNA extraction Kit (Qiagen, 74136) and quality checked using Nanodrop 1000 (ThermoFisher). Paired-end RNA-seq libraries were generated using TruSeq stranded mRNA library kit (Illumina) and sequenced on a HiSeq 4000 platform (Illumina) by the University of Manchester Genomic Technologies Core Facility. Reads were trimmed using Trimmomatic v0.32 (Bolger et al., 2014) quality checked using FastQC (available at: http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc) and aligned to RefSeq transcript annotation of GRCh37 (hg19) using STAR (Dobin et al., 2013). Reads aligned to chromosomes 1-22 and chromosome X were retained. The Cufflinks package v2.2.1 (Trapnell et al., 2012) was used to calculate gene expression levels using Cuffnorm, and to analyse differential gene expression using Cuffdiff. Default parameters were used in both instances. Significant gene expression changes were defined by a fold change of ±1.3 and a Q-value of <0.05. For ERBB2 knockdown experiments, counts for genes were determined using featureCounts (Liao et al., 2014). Log 2 transformed counts were obtained using DESeq2 variance stabilising transformation (VST) function.

Crystal violet assay
200,000 cells were plated on a six-well plate and siRNA/sgRNA treatment started after 24 hr incubation. At specific time-points after treatment plates were washed with PBS and fixed with 4% paraformaldehyde for 10 min. Plates were then washed twice with PBS and kept at 4˚C. Cells were then stained by first incubating plates at room temperature for 10 min in 0.1% Triton X-100 with gentle shaking and then incubated at room temperature for 30 min in 0.1% crystal violet (Sigma-Aldrich, HT90132) with gentle shaking. Plates were extensively washed with water multiple times and left to dry. The dye was solubilised with 10% acetic acid for 10 min with gentle shaking and absorbance was read at 590 nm. Values for siNT at each time-point were used as 100% growth.

Propidium iodine staining assay
Cells were trypsinised and collected as a single-cell suspension, washed with cold PBS, then fixed in 70% ethanol and stored at À20˚C for at least 2 hr. Cells were then resuspended in staining solution (50 mg/mL propidium iodide (Sigma, P4170), 100 mg/mL RNase (Sigma, R4642)) and incubated at room temperature for 30 min. Cells were analysed by the University of Manchester Flow Cytometry Core Facility on a LSRFortessa. Percentages of cells in each cell cycle phase were calculated using ModFit LT (http://www.vsh.com/products/mflt/).
For comparing BO and OAC ATAC-seq, the Cufflinks package v2.2.1 (Trapnell et al., 2012) was used to calculate chromatin accessibility levels using Cuffnorm, and differential chromatin accessibility was analysed using Cuffdiff. Default parameters were used in both instances. Significant chromatin accessibility changes were defined as a fold change of ±2 and a Q-value of <0.1.
For ERBB2 knockdown experiments differential accessibility was calculated using DESeq2 (Love et al., 2014). Alignment files of biological repeats were combined, peaks recalled and peaks from both conditions were then merged using using mergePeaks.pl (using d = 250 parameter) from the HOMER package v4.9 (Heinz et al., 2010) and resized to peak summit ±250 bp. featureCounts from the SUBread package (Liao et al., 2014) was used to count reads within peaks from ATAC-seq samples and these were used an input for DESeq2 to calculate differential binding using default settings. A linear fold change of ±2 and a Q-value of <0.05 were used as a cut-off for further analyses.

ATAC-seq data visualisation
ATAC-seq fragment size was visualised using a custom python script. Correlation plots between technical replicates were visualised using multiBamSummary and plotCorrelation from the deepTools package (Ramírez et al., 2016). Tag density plots and heatmaps were also generated using compu-teMatrix and plotProfile or plotHeatmap tools from the deepTools package. ATAC-seq counts were also visualised using Morpheus (https://software.broadinstitute.org/morpheus/) and hierarchical clustering was performed with this software using 1-Pearson's correlation unless otherwise stated. Correlation plots of samples were visualised using the similarity matrix tool from Morpheus.

De novo motif discovery
To analyse ATAC-seq or ChIP-seq peaks for enriched transcription factor motifs, genomic coordinates were analysed using findMotifsGenome.pl with -cpg -mask -size 200 -bg parameters from the Homer package (v4.7; Heinz et al., 2010). Background sequences were total accessible regions from all samples for ATAC-seq analysis and whole genome for ChIP-seq analysis.

ChIP-qPCR and ChIP-seq and analysis
ChIP-qPCR and ChIP-seq analysis was carried out as described previously (Wiseman et al., 2015). For ChIP-qPCR, 2.5 Â 10 6 cells and 1 mg antibody were used, and analysed using a Rotor-Gene SYBR Green PCR Kit (Qiagen, 204074). For ChIP-seq, 1 Â 10 7 cells, 5 mg target protein antibody, 1 mg Spike-in antibody (Active Motif, 61686) and 50 ml Protein A Dynabeads were used. 20 ng Spike-in Drosophila chromatin (Active Motif, 53083) was supplemented to chromatin preps for Spike-in normalisation, as described previously (Egan et al., 2016). Normal rabbit IgG (Millipore, 12-370) antibody was used in parallel as a control. DNA libraries were prepared using TruSeq ChIP sample prep kit (Illumina) and sequenced on a HiSeq 4000 (Illumina) platform. Sequencing reads were aligned to GRCh37 (hg19) and dm6 using Bowtie2 v2.2.3 (Langmead et al., 2009). Reads aligning to the Drosophila genome were counted and used to generate scale factors. BAM files were then scaled to the sample with the lowest number of Drosophila reads. Only reads with a mapping quality >q30 were retained. Peak calling was performed on individual replicates using MACS2 v2.1.1 (Zhang et al., 2008) using default parameters with additional -SPMR parameter. bedGraph files were converted to bigwig using BedGraphtoBigWig script and visualised in the UCSC Genome Browser. The overlap of peaks between two biological replicates was calculated using BEDtools v2.26.0 (Quinlan and Hall, 2010) using bedtools intersect with default settings with -f 0.3 parameter. Peaks present in both datasets were taken forward for further analysis.
Differential binding analysis was performed using DESeq2 (Love et al., 2014). Overlaps from biological repeats were merged using bedtools merge to generate a final set of peaks. featureCounts from the SUBread package (Liao et al., 2014) was used to count reads within peaks from ChIP-seq samples and these were used an input for DESeq2 to calculate differential binding using default settings. A linear fold change of ±2 and a Q-value of <0.05 were used as a cut-off for further analyses.

ChIP-seq visualisation
Heatmaps of ChIP-seq signal were generated using computeMatrix and plotHeatmap from the deepTools package (Ramírez et al., 2016). Tag density plots were generated using computeMatrix and plotProfile tools from the deepTools package. Correlation of biological replicates was visualised using multiBigwigSummary and plotCorrelation. Euler diagrams were generated using the Euler R package (available at eulerr.co).

Principal component analysis
Principal component analysis was performed using the prcomp function in R (v3.5.1, R Core Team, 2018) using log 2 transformed RNA-seq or ATAC-seq normalised counts. Principal component scores were then plotted in Excel.

Patient survival analysis
Average expression of KLF5 or KLF5 target genes (OAC upregulated and siKLF5 downregulated) was calculated per patient and patients were ranked by average target gene expression. The median was calculated and patients were classified as either above or under median expression. Survival (months) was plotted for each group and a Log-rank test was carried out using GraphPad Prism v8.

Statistical analysis
To determine statistical significance between two groups, a Student's unpaired two-tail T-test was carried out using GraphPad Prism v7. To assess the changes in expression of a group of genes, a one-way ANOVA test was carried out in GraphPad Prism v7. To assess the significance of gene/ region overlaps derived from sequencing data, a hypergeometric distribution test was carried out using the phyper function in R. p-values<0.05 were considered as significant.

Data access
All sequencing data are deposited in ArrayExpress. Additional OAC ATAC-seq data are available at E-MTAB-8447 and additional BO and OAC RNA-seq data are available at E-MTAB-8584. siKLF5 RNA-seq data are available at E-MTAB-8446. KLF5 ChIP-seq data are available at E-MTAB-8568. siERBB2 ATAC-seq and RNA-seq data are available at E-MTAB-8576 and E-MTAB-8579 respectively. CP-A ATAC-seq data are available at E-MTAB-8994.

Additional files
Supplementary files . Source code 1. ATAC fragment size visualisation.
. Supplementary file 1. Differentially expressed genes in OAC. Significantly (±1.5 x; Q-value <0.05) differentially expressed genes between BO (n = 13) and OAC (n = 12) (Maag et al., 2017). . Supplementary file 8. DNA motifs enriched in Cluster one and Cluster two regions. Top 10 motifs found by de novo motif discovery and their associated transcription factors that are enriched in cluster 1 (top) or cluster 2 (bottom).
. Supplementary file 9. Genomic coordinates of regions on OE19 cells that show a decrease in ATAC-seq signal upon treatment of siERBB2 for 72 hr.
. Supplementary file 10. De novo discovered motifs from regions that exhibit reduced chromatin accessibility upon treatment of siERBB2 for 72 hr. De novo motifs, % targets and % background, called transcription factor with match score and p-value are shown.
. Supplementary file 11. List of PCR primers used in RT-qPCR and ChIP-qPCR experiments.
. Transparent reporting form

Data availability
All sequencing data are deposited in ArrayExpress. Additional OAC ATAC-seq data are available at E-MTAB-8447 and additional BO and OAC RNA-seq data are available at E-MTAB-8584. siKLF5 RNA-seq data are available at E-MTAB-8446. KLF5 ChIP-seq data are available at E-MTAB-8568. siERBB2 ATAC-seq and RNA-seq data are available at E-MTAB-8576 and E-MTAB-8579 respectively.
The following datasets were generated: