Super-enhancers in transcriptional regulation and genome organization

Abstract Gene expression is precisely controlled in a stage and cell-type-specific manner, largely through the interaction between cis-regulatory elements and their associated trans-acting factors. Where these components aggregate in promoters and enhancers, they are able to cooperate to modulate chromatin structure and support the engagement in long-range 3D superstructures that shape the dynamics of a cell's genomic architecture. Recently, the term ‘super-enhancer’ has been introduced to describe a hyper-active regulatory domain comprising a complex array of sequence elements that work together to control the key gene networks involved in cell identity. Here, we survey the unique characteristics of super-enhancers compared to other enhancer types and summarize the recent advances in our understanding of their biological role in gene regulation. In particular, we discuss their capacity to attract the formation of phase-separated condensates, and capacity to generate three-dimensional genome structures that precisely activate their target genes. We also propose a multi-stage transition model to explain the evolutionary pressure driving the development of super-enhancers in complex organisms, and highlight the potential for involvement in tumorigenesis. Finally, we discuss more broadly the role of super-enhancers in human health disorders and related potential in therapeutic interventions.


INTRODUCTION
Enhancers are well known for their genomic location and orientation independent activity in the regulation of gene expression. The first eukaryotic enhancer was identified in a primate virus SV40 in the early 1980s. Banerji et al. observed that the remote viral element containing a 72-bp repeat sequence could enhance recombinant ␤-globin expression 200-fold when cohabitating in a plasmid construct transfected into mammalian cells (1). Two years later, several independent groups reported that enhancers located in the mouse immunoglobin (Ig) heavy chain gene loci could activate nearby Ig promoter in cis specifically in lymphocytes, which demonstrated for the first time that enhancers function in a tissue specific manner (2)(3)(4).
The term 'super-enhancer' was first used in 2004 by Chen and colleagues to describe a 651 bp segment of baculoviral genomic DNA designated hr3. They observed that this regulatory domain could stimulate activity of the ie-1 reporter gene promoter up to 7000-fold in transfected cells (5). Almost a decade later, Young and colleagues used the term 'super-enhancer' (SE) to characterize large genomic domains, conferring a key role in control of cell identity and disease (6)(7)(8). Using ChIP-seq data from the multiple tissue types available from the ENCODE and Roadmap Epigenome Projects (9,10), they were able to demonstrate that SEs span tens of kilobases (kb) of DNA sequence and are densely occupied by master transcription factors (TFs) and mediators. Collectively, these observations suggested that SEs play a key role in organizing the gene expression patterns that regulate cell identity (6)(7)(8). The Young definition of SE, in relation to developmentally important genomic segments, extends well beyond the early usage, which related to their performance in expression assays in vitro, and has become the established use of the terminology.
While there is now very substantial support for a paradigm in which SEs are a major regulatory component of the gene expression that shapes cell identity, there is an alternative view held by some researchers, that SEs are no more than clusters of enhancers (11,12) that contribute with additive effect on their target genes in a manner more similar to previously described locus control regions (LCRs) (13). Therefore, in view of this controversy, we consider it is timely to review our current knowledge of SEs and discuss the evidence in support of the range of opinions. In this work, we also recapitulate genome-wide identification and characterization of SEs and provide an online repository of a high-quality collection of SEs with meta-analysis. Furthermore, we explore the biological support for a role of SEs in gene regulation in light of the phase separation and three-dimensional (3D) genome organization models for SE action. We also propose an evolutionary framework to explain the emergence of SEs in complex organisms. Finally, we will discuss the involvement of SEs in human health disorders and their potential as targets for therapeutic interventions.

GENOME-WIDE IDENTIFICATION OF SUPER-ENHANCERS
Young and colleagues originally identified SEs at the genome-wide scale, based on ChIP-seq signal enrichment of Mediator subunit MED1 or master TFs, such as MyoD, T-bet and C/EBP␣, in mouse embryonic stem cells (mESCs) and other tissues including myotube, T helper and macrophages (6,7). A similar strategy was also applied to other cell types using ChIP-seq data of acetylated histone H3 lysine 27 (H3K27ac), which is a surrogate epigenomic marker of active enhancers (8). The Young group also developed the ROSE software tool to facilitate SE identification in silico (7,8). This algorithm stitches closely-distributed enhancers identified from H3K27ac (or MED1/master TF) ChIP-seq data, ranks the stitched enhancers by their inputsubtracted ChIP-seq signal, and finally separates SEs from typical enhancers by a graphic elbow point identified on the ranked ChIP-seq signal plot ( Figure 1A). The output is slightly different for the different kinds of data input, such that the elbow points are usually sharper with MED1 than H3K27ac, and the final SE collections identified by the two marks are not in 100% agreement. To exclude the possibility of transcription start sites (TSS) overlapping with regions of SE calling, constituent enhancers are usually excluded from stitching if they are located within a ±2000 bp window flanking an annotated TSS (8).
To the best of our knowledge there are currently three SE databases which gather published SEs and implement the ROSE algorithm to mine available ChIP-seq data, including dbSUPER (14), SEA (15), and SEdb (16). The most recent of these, SEdb, contains a collection of more than 331,000 SEs derived from 541 human cell lines/tissues. We also provide an online data repository of SE data, including a core collection of human SEs with comparative and exploratory analyses (discussed below in this Survey and Summary) to further support the biological investigation of these structures. This resource is available at https:// sunlightwang.github.io/Super-Enhancers/ and will be continuously updated and expanded going forward.

SEs are comprised of a small number of genomic loci of extremely large size
In a comparison of SEs and typical enhancers (TEs) in 30 cell lines, 24 tissues and 11 primary cell types available from the ENCODE project (10), it was noted that the median size of SEs in general spreads from 10 kb to over 60 kb, whereas the median size of TEs ranges from 1 kb to 4 kb, smaller by approximately one order of magnitude (6,7) ( Figure 1B, upper). By contrast, when looking at the number of SEs and TEs in each cell type, the trend is exactly the opposite: SEs are fewer than TEs by one to two orders of magnitude (Figure 1B, bottom).
Around the time that these SE were described, another group independently reported enhancers of size >3 kb, and used the alternative nomenclature 'stretch enhancer' (StrE) to characterize their extraordinary length (17). Similar to SEs, StrEs are also found cell type specific and important in programming cell identity gene expression (17). Although SEs and StrEs share some properties, they are conceptually and functionally different in at least two respects. Firstly, while StrEs are determined by an arbitrary cut-off in genomic size (3 kb), SEs are discriminated from other enhancers in a parameter-free manner after clustered enhancers stitching ( Figure 1A), which gives more weight to the biological essence of SEs. Secondly, extraordinarily strong TF binding and associated Mediator complex signals endow SEs with special biochemical properties, for example, facilitating them to form liquid-liquid phase-separated condensates (discussed later in this article). By contrast, any large enhancer can be designated a StrE, regardless of their biological activity.
In a comparison of SEs with TEs, StrEs in eight cell lines where collections of all the three enhancer types are available, both the small-number and extremely-large-size of SEs become more apparent ( Figure 1C). For example, the median size of StrEs is in the order of a thousand base pairs, 2to 3-fold larger than that of TEs, but still much smaller than their SE counterpart. While SEs are usually stitched up by constituent enhancers, between which there can be gaps of up to 12.5 kb (a cut-off used in the original SE paper), StrEs are defined as large enhancers or enhancer-like chromatin states based on hidden Markov model inference. Therefore, some caution should be applied where large gaps may prevent an account of synergistic influences from nearby domains in a StrE. When looking at the number of the three types of enhancers in each cell line, the StrEs are numerous, much more comparable to TEs than SEs ( Figure 1C, D).

SEs specify cell identity
Within the high-quality SE collections derived from the ENCODE project, including the data from 65 samples described above, we charted the genome-wide SE landscape as shown in Figure 2A. The vast majority of SEs only appear in few or individual cell types ( Figure 2B), supporting the assertion that SEs are highly cell type specific. To further explore this specificity, we performed hierarchical clustering based on the Jaccard distance that measures the SE location dissimilarity between every pair of all the 65 samples. A larger Jaccard distance is indicative of fewer overlapped SEs relative to the total number of all SEs occurring in the two sample comparison, and vice versa. As shown in Figure 2C, the hierarchical clustering indicates a clear separation between tissue samples (except for only three outliers) and cell lines / primary cells, indicating that SEs are also sensitive to the cellular growth environment. Indeed, environmentspecific SEs were observed in resident macrophages (18). Principal component analysis (PCA) based on SE occurrence matrix of the 65 samples also confirmed the same phenomenon, even with a much clearer segregation of tissue samples from the others ( Figure 2D). Primary cells scatter between tissue samples and cell lines when looking at the first component (i.e. the x-axis of Figure 2D), which accounts for nearly one third of the variance, confirming that both the growth potential and immortality of primary culture cells lie in between the tissue samples and cell lines. Intriguingly, the PCA visualization of the 65 samples based on SE occurrence is analogous to the analysis based on global gene expression, where cell lines show distinct gene expression profiles compared to normal tissues (19). It is also interesting to note that sub-clusters comprised of only primary cells or cell lines are observed, despite the fact that they are, in general, more intermingled ( Figure 2C, D). When looking at a finer resolution in Figure 2C, cell lines derived from similar cell origins (e.g. prostate or blood), tend to be grouped together immediately, which is consistent with literature reports (8). Such phenomenon is also observed in tissues developed from a common pathway (e.g. thoracic aorta, ascending aorta and coronary artery).
These observations suggest that SE profile can be used as a biomarker or proxy to categorize cell types and their cell growth conditions or status. Moreover, the fact that SEs differ among the same cell type in different niches also suggests that they are sensitive to external environmental signals (18). To date, SEs are becoming commonly used as molecular markers to sub-classify complex diseases in precision medicine, and this will be discussed later in this article.

SEs are densely bound by master TFs
It has been shown that enhancers can interact with promoters via long-range chromatin loops to activate gene transcription (20). These structures are mediated and stabilized by the complex cooperation between an array of cisacting regulatory elements and trans-acting proteins that co-localize in each chromatin domain. In both human and eukaryotic model organisms, TFs are found densely bound to enhancers in a combinatorial manner and synergistically responsible for regulatory activity (21)(22)(23)(24)(25)(26). SEs are densely co-occupied by master TFs and Mediator complex, and these master TFs establish auto-regulatory networks (6,27,28). In mouse ESCs, the three well-known pluripo-tent TFs, Oct4, Sox2 and Nanog are identified at SEs at extraordinarily elevated levels (6). In comparison to StrEs, SEs have a significantly higher level of chromatin accessibility surveyed by DNase I hypersensitivity assay and also denser master TF binding signals (29).
Interestingly, TF binding sites inside SEs are not evenly distributed but form a series of dense clusters, termed 'constituents' (6), 'epicenters' (30) or 'hotspots' (27). In a more dynamic system, Adam and colleagues took advantage of the hair follicle stem cell differentiation model, and observed the remodeling of SEs and their epicenters during lineage commitment (30). The cohort of TF binding sites was also altered following the change of SE locations. The location alteration was reversible and remarkably sensitive to microenvironment, allowing high plasticity for cells to adapt the environment for proliferation and lineage commitment. In this system, the pioneer factor SOX9 took the major responsibility to govern the landscape of SEs by protecting against H3K27me3 (histone H3 lysine 27 trimethylation, a marker for inactive chromatin) and initiat-ing enrichment of H3K27ac. In the aforementioned example, macrophages residing in five different tissue environments share only ∼40-50% of SEs (18). The pioneer factor PU.1 is a fate-determining TF for transmitting the environmental signal to commitment of environment-specific SEs in macrophages.
Several lines of evidence suggest that master TFs are capable of directing active SE formation (18,30,31) and this is supported by extensive co-localization of TFs within SE 'hotspot' regions either sequentially or simultaneously in response to developmental stimuli (27). Hnisz et al. demonstrated that terminal TFs of multiple signaling pathways frequently co-occurred in SEs but not their smaller TEs cousins, suggesting that SEs may provide a 'hub' for cells to be hyper-sensitive to divergent environmental signals (32).

High occupancy of mediator at SEs
Defined by the enhancer clusters with high Mediator occupancy and/or strong enhancer activity, SEs are expected to largely overlap with known functional chromatin domains, such as differential methylation regions (DMRs), locus control regions (LCRs) and transcription initiation platforms (TIPs) (8). Compared to stretch enhancers and typical enhancers, SEs exhibit significantly higher enrichment of active chromatin marks and binding of chromatin remodelers and organizers (29). The Mediator complex, composed of 26 core subunits, is known to be associated with enhancers and mediate enhancer functions by transmitting regulatory signal to the associated transcription machinery (33)(34)(35)(36). Mediator contributes to stable assembly of transcription pre-initiation complex, regulation of RNA Polymerase II (PolII) pausing and elongation, and the formation of enhancer-promoter looping and three-dimensional (3D) genome organization (reviewed in (34)). SEs are heavily loaded with the Mediator complex at least one order of magnitude greater than any typical enhancers (6), giving rise to extraordinary activity and specific biochemical characteristics (7).
Altogether, SEs demonstrate their 'superness' not only by their enormous size, achieved though clustering closely localized enhancers, but also their super-strong transcriptional activity due to dense interaction with transcription factors, chromatin remodelers, transcription co-activators, and Pol II holoenzyme ( Figure 1D). These properties are further supported by their capacity to drive short and long range interaction through phase separation and 3D genomic association, highlighting the qualitative difference to typical enhancers (see below), and suggest a mechanism to explain why transcriptional activation by SEs is greater than the sum of their constituent enhancer components (12).

THE ROLE OF SUPER-ENHANCERS IN TRANSCRIP-TIONAL REGULATION
Given the unique characteristics of SEs, it is interesting to speculate why complex cells need them? The most likely answer to this question is that SEs specify and maintain cell identity, which is a vital biological attribute of complex multicellular organisms that need to be able to developmentally regulate the formation and maintenance of cell type specific tissue compartment. While mechanisms for transcriptional regulation through promoter activation were first identified in unicellular prokaryotes (37), the level of sophistication in vertebrate systems probably required more mechanistic complexity including long-range chromatin interactions from enhancers to target genes (20). Cell type specificity, in particular, has been shown to be associated with both master TFs and epigenetic chromatin marks. However, very little is known about how tissue-specific enhancers and, in particular, SEs emerge to support lineage commitment during development (38).

Phase separated condensates formed at SEs
SEs are extraordinarily densely bound by master TFs and highly occupied by the Mediator complex and other transcriptional coactivators, which raises the very interesting possibility that the genome can generate unusual physicochemical properties at their site of action. For example the Young lab has recently shown that the transcriptional coactivators BRD4 and MED1 are components of the liquidliquid phase separated transcriptional condensates (39,40). They provided additional evidence that BRD4 and MED1 condensates co-localized with SEs (39) ( Figure 3A). In a parallel study, Cho et al. showed further evidence that mediator and Pol II were co-existing in stable subcellular compartments, forming condensates and associated with SE elements (41). In both studies it was demonstrated that the formation of phase-separated liquid condensates were impaired in cells treated with 1,6-hexanediol, a compound known to disrupt these complexes (39,41). In addition, ChIP-seq with antibodies against BRD4 and MED1 revealed that the treatment of 1,6-hexanediol resulted in reduced BRD4 and MED1 binding at enhancers, and the effect was more profound at SEs (39). Mechanistically, SEbound proteins such as BRD4 and MED1 typically contain large intrinsically disordered regions (IDRs), which multivalently but weakly interact with a large number of TFs and cofactors that also contain IDRs (40,42,43). The multi-valent interaction between IDRs facilitates condensation and liquid-liquid phase separation (LLPS) (39,44,45), where the highly concentrated transcriptional machinery guarantees the robust expression of cell identity genes. In support of this observation, Gibson et al. experimentally observed that adding BRD4 promoted formation of a new liquid phase of acetylated chromatin (46). This is in concordance with related work in the field suggesting that the phase separation process plays a prominent role in 3D genome organization and is involved in organizing cell identity (47). The involvement of SEs in formation of LLPS also supports the assertion that SEs contribute more to transcriptional regulation than the additive effect of its multiple component TEs. More broadly it has been suggested that membraneless cellular organelles formed via phase separation play critical roles in a variety of cellular processes (reviewed in (48)). The localized protein concentration in these compartments is thought to be vital for the formation of phase-separated liquid droplets (49). The hub-like characteristics of SEs make the ideal biomolecular substrates to form phase-separated condensates comprising  (50). While SEs reach a high transcriptional activity at relatively low valency, typical enhancers need higher valency to achieve the same transcriptional activity (figure was adapted from (50) with permission from Elsevier). (C) A schematic chart showing the energy homeostasis in formation of LLPS at SEs. Structured Interaction and multivalent interaction occur, providing energy to compensate the loss of entropy when LLPS forms and the system gets ordered.
highly cooperative TFs, chromatin remodelers, transcription co-activators and RNA Pol II that elevate the local density ∼10-fold the molecular density of the components at typical enhancers (6)(7)(8). As a result, Sharp and colleagues hypothesized that formation of a phase separated assembly more likely occurred at SEs than at TEs (50). Based on this hypothesis, they proposed a kinetic model of transcriptional control, and explored the dynamic property of transcriptional activity by varying the number (N) of interacting molecules (i.e. the enhancer element and its associated factors) in a fixed-volume system, and the valency of each molecule. In this example, the transcriptional activity was quantitatively approximated by the relative size of the largest molecule cluster connected via cross-linking interactions (i.e. maximum size of clusters/N). When the transcriptional activity was reaching the value of 1, all molecules in the system formed a single interacting cluster, very likely resulting in the phase-separated assembly (50). By modeling SEs as a system containing 50 molecules while typical enhancers as a system consisting of 10 molecules, they revealed a divergent relationship between transcriptional activity and the change of valency between the two systems ( Figure 3B). Where SEs reached a high transcriptional activity at low valency, typical enhancers needed higher valency to achieve the same transcriptional activity. This result suggests that SEs may undergo phase separation at a lower level of valency than typical enhancers. Moreover, a steeper increase of transcriptional activity of SEs was observed ( Figure 3B), indicating that SEs likely behaved as binary switch in regulating key gene expression and could rapidly facilitate the establishment and maintenance of cell identity. In addition, the model also successfully explained a number of important observations in enhancer-mediated transcriptional control, including the transitional bursting patterns of enhancers and the ability of SEs to simultaneously activate multiple genes (50).
In terms of energy homeostasis, Chakraborty et al. claimed that formation of LLPS drives the chromatin in order and therefore leads to loss of entropy. This biophysical change requires compensation from energetic gain to maintain stable condensates. Where these form at enhancers, there are two ways to achieve this: (i) strong structured covalent master TF-DNA interaction and (ii) weak multivalent protein-protein (mostly TF-coactivators) interactions via IDRs (51). Therefore, SEs, exceeding other regulatory elements in both master TF binding sites and co-activator concentration, are more prone to the formation of transcriptional condensates ( Figure 3C).

3D genome organization concerning SEs and their targets
The 3D genome organization has been shown to play critical roles in gene regulation and cell functions, also exhibiting cell-type specificity (47,52,53). In particular, insu-Nucleic Acids Research, 2019, Vol. 47, No. 22 11487 lated neighborhoods within chromosomal loop structures formed by CTCF-mediated interaction of two chromatin domains, provide a powerful paradigm for precise gene expression control (54). In other words, the superior transcriptional activity of SEs has to be strictly restricted within insulated neighborhoods such that they can be precisely and specifically tethered to their target genes. Even compared to stretch enhancers, relatively higher occupancy of cohesin and CTCF, the two factors mediating long-range DNA interaction and looping, has been found in constituents of SEs (29), supporting the notion that the chromosomal loops connecting SEs and promoters are more strictly controlled and maintained. An important question related to this is how the cell identity genes and their associated SEs are organized throughout the entire genome at the nucleotide sequence level?
Human genome contains many gene poor regions, called gene deserts that range in size from 5% to 40% of the entire chromosome (55). These segments often referred to as 'junk DNA' comprising 716 Mb, can be classified into two different categories based on their sequence conservation: stable gene deserts (>2% conserved) and variable gene deserts (<2% conserved). Intriguingly, gene ontology (GO) analysis shows that the tissue specific or developmentally regulated genes are moderately expressed and most of them are located in the stable gene deserts (55,56). In addition to the key cell identity gene bodies, their cis-regulatory elements are also embedded throughout the gene deserts. Comparative analysis has revealed that the density of transcriptional regulatory elements is three times higher in stable gene deserts than variable gene deserts and other intergenic regions (55). These cis-regulatory elements are generally linked to the neighboring genes, supported by a substantial depletion of synteny breakpoints in between. For example, the murine ESC pluripotent gene Sox2 locus at chromosome 3qA3 is flanked by 1.5 Mb stable gene deserts and regulated by its SE 130 kb downstream of the gene body in the same gene desert (57,58,59) ( Figure 4A). Another case is the human proto-oncogene MYC locus, which is located within a 1.2 Mb stable gene deserts at chromosome 8q24, regulated by a few distinct SEs in different tissues. Genome-wide association studies (GWAS) of multiple cancer and metabolic disease cohorts have identified single nucleotide polymorphisms (SNPs) in the gene deserts tightly linked to high risks of breast cancer, colorectal cancer, prostate cancer, ovarian cancer and so on (summarized in (60)). Some of these SNPs are located inside TF binding clusters (21), which are recognized as constituents of MYC gene SEs (8,32,61). A few more gene loci with similar genome arrangement have been reported, including loci of KLF4, OTX2 and DACH1 (62)(63)(64).
These observations suggest that the key genes tend to be independently organized from other chromosomal territories, which is likely to be evolutionarily favored for two non-mutually exclusive reasons, if not more: (i) enclosed genes are essential for cell identity and function, and thus require precise expression without interference by other signals other than its own SEs; (ii) SEs are so powerful that they need to be prevented from driving unrelated neighboring genes unexpectedly. Interestingly, in ESCs, Dowen et al. found that these cell identity genes not only occurred in gene deserts but also were restricted within insulated neighborhoods enclosed by CTCF and its associated cohesin, termed as SE domains (SDs) (65). The same phenomenon was described in a study on T-lymphoblastic leukemia (TLL) cells where an SE was restricted within the same CTCForganized neighborhood of its target locus IL7R, a key gene for normal T cell development and pathogenesis of TLL (66). Deleting a CTCF binding site at one border of the SD caused dysregulation of internal cell identity genes and activation of external genes nearby the SDs (64,65). Disruption of insulated chromatin neighborhoods set for repressing proto-oncogene expression could activate oncogene and lead to severe health problems, further supporting a role of SD for precise control of essential gene expression (67).
The transcriptionally insulated neighborhoods, at a median size of ca. 200 kb, are formed by flanked CTCF binding in convergent orientations (64,65) ( Figure 4B). Similar genome organization is observed as 'contact domains' (median size: 180 kb) identified from deeply sequenced in situ Hi-C data (68). The 'contact domains' are associated with distinct patterns of histone modifications across the domain borders. Taken together, this strongly deduces that 'contact domains' (by in situ Hi-C) and 'insulated neighborhoods' (by ChIA-PET) may refer to the same 3D chromatin structure units or at least with partial overlapping (64). When comparing the Hi-C data and ChIA-PET pairs carried out in the same primed human ESC (H1) line, Ji et al. found that the insulated neighborhoods were the fundamental units (subdomains) that constituted the megabase-sized topologically associating domains (TADs), which was also mediated by cohesin-associated CTCF-CTCF loops. Thus, they propose a model that genome is partitioned into many large and physically close TADs, which are constituted by multiple transcriptionally insulated neighborhoods. These subdomains delimit the effective range of enhancers, SEs and repressors (64).
However, forming the 3D genome configuration is a complex and dynamic process, which has yet to be completely understood (69). With combined efforts from multiple disciplines, a few models of transcriptional regulation and genome organization have been proposed. For example, a recent model based on physical principles suggests that forming clusters of active RNA polymerase and associated TFs is the elementary feature of 3D chromatin structures. These clusters are surrounded by DNA loops, and thus the large domains, TADs and chromatin A/B compartments are simply a single or multiple cluster with loops. This model also explains the extraordinary activity of SEs in transcriptional regulation, because Mediator and master TFs bound to SEs would increase the time of an associated promoter staying closely to the active polymerase clusters (70).

A multi-stage transition model of the SE formation
Experimental evidence supports that binding of master TFs not only attracts more TFs, but also facilitates the recruitment of chromatin remodelers and transcriptional coactivators that possess enzymatic activity for histone modifications (71)(72)(73)(74)(75)(76). Early biochemical studies uncovered that a single TF, C/EBP ␤, could compete against histones for DNA binding and mono-nucleosome formation (77), which was also confirmed at the chromatin level. A single master TF, HNF3 or GATA4, could compete against assembled nucleosome arrays (chromatin fibers) with assistance from its C-terminal DNA binding domain, and consequently makes the DNA accessible to other proteins including chromatin remodelers and modification enzymes (78). In a recent study, it has been found that immediately after DNA replication in each cell cycle, nucleosomes are repositioned at promoters and enhancers, followed by re-establishment of DNA accessibility led by TFs and chromatin remodelers in a fast mode (accomplished in hours) (79). As a result, phase-separated 'transcription factories' are formed around the protein-crowded chromatin foci, which can also be stably inherited along cellular division (80)(81)(82)(83). Such 'factory' organization greatly increases the efficiency for multiple genes being coordinately regulated, and is supported by the recent observation that promoter-promoter interaction takes up 42% of total long-range interactions (84).
In some cases, one TF may be insufficient for nucleosome exclusion and chromatin opening, requiring instead cooperativity of multiple master TFs (85)(86)(87). For example, temporally persistent hierarchical binding of SOX2/OCT4/KLF4 prior to c-MYC was found in mammalian ESCs (87). Despite the importance of a cluster of TF binding for enhancer activity, the TF-DNA binding itself remains short-lived and highly dynamic. Chen et al. took advantage of superresolution microscopical single-molecule tracking technology (SMT) and observed that TFs employed a 3D diffusiondominated searching mode assisted by 1D sliding to locate their targeting sites, which took over 6 min (88). By contrast, the residential time for SOX2 binding to its stable recognition site lasted for only ∼12-14 s. These results suggest that TF binding alone could not fully explain the establishment of enhancers, whereas more stable and inheritable events are required, although TF binding per se is an initial step.
Collectively, we introduce a 'multi-stage transition' model to describe the consecutive drift of evolution of the regulatory landscape from TF-DNA binding to SE formation ( Figure 5). The initiation of the regulatory element takes place when a master TF searches for its target site, and competes against local nucleosomes for stable binding (Stage I). To achieve a more stable stage, such an event must be followed by recruitment of chromatin remodelers and histone modification enzymes to re-organize and maintain the more accessible and inheritable chromatin environment (Stage II). Along the evolution of cellular function, a few key genes demand synergistic regulation from multiple convergent regulatory elements and thus drive the formation of SEs (Stage III), the final destination for active transformation of cis-regulatory components ( Figure 5).
A variety of evolutionary pressures separate the formation of SEs from TEs in different stages of the model. During Stages I and II, stable binding of a master TF is required initially to compete with nucleosomes, this is followed by the subsequent binding of other transcriptional activators and chromatin remodelers. As shown before in other examples, master TF binding at TEs is far less prevalent than at SEs. The lack of abundant master TF binding leads to a reduced enrichment of chromatin remodelers and co-activators (Stage II). The binding of BRD4 can facilitate the formation of LLPS between acetylated and unacetylated chromatin (46). Upon establishment of LLPS, the nascent SEs become more stable than other foci. In the last stage, selection pressure eventually 'selects' the most functional SEs: (i) if a gene driven by a new SE is toxic to the cells, the corresponding SE hence becomes deleterious and will experience negatively selection and will tend to be removed from the cell population; (ii) when a gene driven by an SE is non-essential for cell function, the neutral genetic drift will dilute its presence in a cell population. Therefore, only the key genes that equip cells with proper functions will be positively preserved and retain their corresponding SEs during evolution. Analogously, a selective process could help oncogenes quickly develop SEs that predisposes them to uncontrolled growth advantage, and suggests a mechanism that could drive tumor cells to acquire SEs in tumor pathogenesis (8).

SUPER-ENHANCERS AND DISEASES
SEs and their defects have been linked to multiple genetic diseases (8), including cancer (7,89), metabolic (8,90,91) and immune diseases (92) (reviewed in (93)). In this section, we will discuss current knowledge of SE-associated human health conditions, and the potential applications for disease diagnosis, prognosis, and treatment.
The vast majority of risk SNPs that confer genetic diseases rarely alter protein coding but mostly reside in noncoding regulatory loci (94,95). Significantly larger disease-associated mutation repertoire has been found in SEs compared to other open chromatin regions, such as promoters and typical enhancers (8). In some cases, small mutations and indels have been found to unexpectedly generate new SEs or rewire SEs to oncogenes that drive tumor pathogenesis (8,67,89,96).

Cancers
SEs possess the capability of not only specifying cell identity but also maintaining cancer cell identity and discriminating carcinoma subtypes (97). The application of SE analysis to medulloblastoma is one of the first cases showing efficient way of using epigenetic data to trace the cellular origin of poorly characterized cancer malignancies (98,99). By querying their respective SEs together with their associated master TFs, Lin et al. managed to (i) locate the novel targets of the aberrant transcriptional system in different cancer subtypes and (ii) more importantly, identify cell-oforigin for cancer subgroups. Due to mutation occurrence, a MYB binding motif is generated next to a silent oncogene, and the motif recruits MYB protein binding. Following the multi-stage transition model, the master TF binding accumulates more stable signals of other TFs, chromatin remodelers and histone modifications, resulting in formation of a stable and strong SE. Hence, the adjacent oncogene is activated (black arrow) by the newly acquired SE and causes oncogenesis. (B) Model 2: Unlike Model 1, the mutation does not create a novel binding site for master TFs but erases a binding site of CTCF, which is also an anchor site for an insulated neighborhood (brown circle). The formerly silent oncogene is activated by a juxtaposed SE and consequently results in tumorigenesis. Note that CTCF/Cohesin binding site mutation is significantly enriched in cancer genome. (C) Model 3: The activation-induced cytidine deaminase (AID) is accumulated to the largely permissive chromatin surrounding an SE by convergent transcripts from enhancer RNA and mRNA transcribed from the SE and gene body, respectively. Binding of AID triggers instability of genome and concomitantly translocation, which brings an oncogene next to the SE and promotes lymphomagenesis.
Multiple cancer types show either a prominent mutation rate or distorted regulatory landscape at the SEs of driver oncogenes in disease-relevant cell types. One particularly interesting example was discovered in T-cell acute lymphoblastic leukemia (T-ALL). For a long time, a helixloop-helix TF TAL1 has been associated with bi-allelic activation in some T-ALL patients, which could be caused by loss of function of an upstream repressive regulatory element (100). In addition to these individuals with bi-allelic ectopic expression, it was found that there were a substantial number of patients and cell lines carrying mono-allelic overexpression of TAL1 (100). When the same team revisited the T-ALL subgroup with mono-allelic activation of TAL1, they discovered that the aberrant allele acquired a 20-kb long SE accommodating binding sites for a few major leukemogenic TFs, including RUNX1, GATA-3 and TAL1 itself (96). Strikingly, this oncogenic alteration was simply caused simply by a 12 bp insertion which generated a new MYB binding site that did not exist in cells without gaining the SE ( Figure 6A).
Mutation or insertions that inappropriately activate enhancers to reboot a repressed disease driver gene, is not a novel finding in the pathogenesis of malignancy. In fact it is particularly common at the MYC locus (101,102). Wholegenome sequencing analysis of the widely-studied human cervical cancer cell line HeLa uncovered an active genomic fragment from Human Papilloma Virus type 18 (HPV-18) insertion into an originally silent constituent of a MYC super-enhancer, and thereafter drove upregulation of the proto-oncogene MYC promoting oncogenesis (103). However, in the TAL1 example, the short insertion for the de novo MYB binding site that creates a broad SE is rather unexpected. MYB binding itself is dynamic and thus may not be able to boost the activation of TAL1 expression. However, MYB, as a pioneer factor, draws chromatin remodelers, coactivators and other TFs in to establish a rel-atively stable regulatory niche. Accumulation of more and more executive activation factors to the mutated locus promotes formation of an SE that is powerful enough to drive oncogenic TAL1 expression and subsequent carcinogenesis. This model provides a novel mechanism for tumorigenesis through gaining binding sites of master TFs, which results in greater-than-expected gene expression and upregulation of oncogenes.
Cancers associated mutations are also frequently detected at CTCF and cohesin bound sequences, which are also enriched in SEs (104). With the whole genome sequencing of 231 colorectal cancer (CRC) patient tumors and paired normal intestinal tissues, Katainen et al. reported overall elevated mutation frequency in CTCF binding sites (CBSs) by an order of magnitude compared to average genome. Through further scrutiny of public cancer genome databases, they found that the CBSs were the somatic mutational hotspots in the non-coding cancer genome of multiple tumor types. These mutated CBSs were highly overlapping with CTCF-CTCF anchor sites that formed insulated neighborhoods in human ESCs. This is also consistent with another study by Hnisz et al. in which microdeletions caused loss of the boundary CTCF binding and consequently defeated the insulated neighborhood for repressed oncogenes expression in T-ALL (67) ( Figure 6B).
During the process of somatic hypermutation and classswitch recombination, B cells have to undergo a series of DNA double strand breakage and highly frequent mutations triggered by activation-induced deaminase (AID) (105). Such a process, can aberrantly cause genome instability and malignancies. AID, which does not have any DNA sequencing binding motifs, has been shown with a high binding rate at extensively permissive SEs and transcriptionally active gene bodies (106,107). Comprehensive analyses dissecting the molecular mechanism involved show that Pol II stalled by convergent transcription collision recruits off-targeted AID to the largely permissive environment of SEs, which subsequently initiates mutations and translocations. Translocation can bring proto-oncogenes under the control of SEs resulting in over-expression and B cell lymphomagenesis (108) ( Figure 6C).

Developmental defects
Besides cancer, the proto-oncogene MYC is also linked to various developmental defects including common congenital malfunction, cleft lip with or without cleft palate (CL/P) (109)(110)(111). The MYC locus synteny is conserved between human and mouse. More interestingly, most of the mutation hotspots are enriched in cell type specific H3K27ac peaks, representing tissue-specific MYC enhancers and SEs (102,(112)(113)(114). Using genome editing tools to manipulate the mouse Myc locus, Uslu et al. confirmed that a 300kb medionasal super-enhancer (MNE) was responsible for 30% of Myc expression in face and limb during development, but not in liver or heart (102).
In comparison, manipulation of a typical enhancer may only have a marginal effect, considering its low regulatory activity (58,59,113,115). Thus, deletion of larger regulatory elements like SEs could help in the detection of such effects and better understanding relevant mechanisms. Systemati-cally genome-wide validation of SE function for development and diseases is intriguing. The fast development of genome editing tool CRISPR/Cas9 (116,117) makes in vivo validation studies more feasible and should be more broadly applied in the near future (57,58,115).
A recent work published by the Mundlos lab showed in vivo evidence from animal model that CTCF organized topology of 3D chromatin structure was responsible for SE functions (118). Firstly, they identified three heterozygous deletions on neighboring chromosome regions in patients with three different limb genetic abnormalities -brachydactyly, F-syndrome and polydactyly. Subsequently, they used the CRISPR/Cas9 editing tool to reconstruct the chromosome deletions and successfully reproduced the disease phenotypes in mouse models. A 150-kb SE fragment was found to be the target as disruption of chromatin structure by deletion, inversion or duplication rewired this regulatory element to genes in neighboring TAD structure, which was originally insulated in wild-type cells. This mechanism is very similar to what was observed for oncogenes TAL1 and LMO2 in T-ALL as TAD structure itself is an insulated neighborhood (100).

Potential therapeutic targets
Since SEs are heavily modified biochemically by various enzymes such as acetyltransferases or methyltransferases, they exhibit the potential 'drug targetable' characteristics. Two unique characteristics of SEs that we discussed above further make them particularly relevant targets for cancer therapy: (i) SEs are more sensitive to external signals than any other genomic regions; (ii) SEs are controlling cell identity genes in both healthy and disease cells. Both features could potentially guard the specificity of drugs, one of the most challenging issues in cancer therapy.
The epigenetic modification enzymes are found mutated at high frequency in cancer patients (∼20-40%), such as DNMT3A (DNA cytosine de novo methyltransferase 3A) in AML and MDS, and CBP/EP300 (lysine acetyltransferases) in bladder cancer and lymphomas (reviewed in (119)). The DNMT inhibitors, 5-azacytidine and decitabine, have been clinically used to effectively treat high risk myelodysplastic syndrome and AML (120). Among others, inhibitors of histone lysine deacetylases (HDACs), acetyltransferases (CBP) and chromatin remodelers are all available in the pharmaceutical market for cancer treatment (119). In addition to histone modification enzymes, targeting transcription coactivators is another feasible therapeutic strategy. Drugs such as JQ1 and THZ1, inhibiting BRD4 (7) and CDK7 (121), respectively, have been shown to specifically target tumor-specific SEs, providing an efficient way for targeting only cancer cells.
BRD4 and JQ1. The first compound reported to target SEs is 'JQ1', designed to specifically inhibit Bromodomain and Extraterminal (BET) superfamily member BRD4 that highly occupies SEs in cancer cells (7,122,123). BRD4 is known as a Mediator-interacting partner since the 1990s (124) and it is also a drug target in leukemia (125). Treating multiple myeloma tumor cells with JQ1 causes disproportional loss of BRD4 binding to the genome, more pro-nounced at SEs than TEs and other regions (7). Therefore, the SE controlled genes that include mostly oncogenes in cancer such as MYC, get more affected than other genes in tumor cells. Following independent studies, using JQ1 to target BRD4 and SEs, have demonstrated similar effects in a broad spectrum of cancer types, such as colorectal cancer (126), ovarian cancer (127), Merkel cell carcinoma (128), B cell lymphoma (129), and alveolar rhabdomyosarcoma (123). Of note, JQ1 is currently still being evaluated in phase I and II clinical trials, and resistance to JQ1 has been reported in a few cancer cases (130)(131)(132).
CDK7 and THZ1. Mediator super-enrichment in SEs also contributes to accumulation of Pol II holoenzyme and other subunits of transcription machinery. Cyclin Dependent Kinase 7 (CDK7) is a kinase subunit of a general transcription factor TFIIH, required for transcription machinery assembly. It can phosphorylate C-terminal domain (CTD) of Pol II and facilitate transcription initiation. Targeting CDK7 by its small molecule covalent inhibitor THZ1, can effectively inhibit master TF RUNX1 expression in T-ALL by disrupting its SE-associated transcription regulation circuitry (121). THZ1 shows a strong inhibitory effect on tumor growth in a human xenograft mouse model in a dose dependent manner. Similarly, THZ1 downregulates amplified MYCN expression in a high-risk neuroblastoma mouse model, notably, without any systemic toxicity (133). Strikingly, when the same strategy is applied to small-cell lung cancer (SCLC), on which there has been no significant therapeutic progress since chemotherapy was introduced in the 1970s, SE-associated genes, including MYC family genes, are highly vulnerable to the THZ1 treatment (134).
These two examples show that SE-associated protein factors are ideal options for drug targets. Besides the recent advances in genome editing technologies, CRISPR/Cas9, in particular, offer a possibility of novel gene therapies that directly correct pathogenic SEs (93). Of note, not only SEs and their directly interacting factors, but the pathways and master genes identified associated with disease-specific SEs also provide potential therapeutic targets, like transcription factors LMX1A, EOMES and LHX2 for group 4 medulloblastoma (98,99).

CONCLUSIONS
In this Survey and Summary, we have highlighted the unique characteristics of SEs and their biological role in transcriptional regulation in health and disease. In particular, we addressed the 'elephant in the room': are SEs clusters of additive enhancers or a novel type of synergistic regulatory element? To address this question, we provide multiple lines of additional evidence suggesting that they are highly likely to be a unique biological entity in transcriptional regulation. Firstly, this is supported by a number of distinct characteristics of SEs compared to other enhancers, including their sequence composition, genomic size, regulatory activity, proteins bound, and genes under their regulation (also reviewed previously (135)). One of the most striking features of SE loci to emerge recently is that they are likely to be isolated from other chromatin domains by forming an insulated nuclear compartment through liquid-liquid phase-separated, membraneless condensates. In these focused compartments, they are more likely to have the necessary autonomy to precisely drive the regulation of genes controlling cell identity (40,136). Secondly, knockout experiments targeting SE components, furthermore demonstrate that functional hierarchy and synergistic interactions exist among different constituents within an SE (32,137). For instance, in a study targeting the constituents of the ␣-globin gene-associated SE (11), a linear-logistic model, which allowed for interactions between constituent enhancers, explained the knockout results better than a simple linear model (12), suggesting that the regulatory relationship between individual constituents are 'synergistic'. Finally, SE constituents display the tissue specificity as an entire group. For example, SEs with multiple constituents in MYC-locus are located in a non-overlapping manner among different cell types (8,115): CRC tumors generally harbor an SE of MYC ∼300 kb upstream of its promoter, while in acute leukemia an SE 1.7 Mb downstream of the TSS plays a primary role of activating c-MYC expression (138,139). These observations further suggest the synergistic role of SE constituents in regulating gene expression. We therefore conclude that SEs form a distinct regulatory entity beyond additive clustering of independent enhancers, although more in vivo evidence is still required to further support this concept. Given the involvement of SEs in human pathology, future studies that generate greater understanding of their function as holistic entities, will be important for the development of new biomarkers and treatments that target these powerful genomic regulatory structures.

DATA AVAILABILITY
A core collection of SEs used for the meta-analyses in this Survey and Summary and the computational scripts are available in the GitHub repository (https://sunlightwang. github.io/Super-Enhancers/).