Analysis of long and short enhancers in melanoma cell states

Understanding how enhancers drive cell-type specificity and efficiently identifying them is essential for the development of innovative therapeutic strategies. In melanoma, the melanocytic (MEL) and the mesenchymal-like (MES) states present themselves with different responses to therapy, making the identification of specific enhancers highly relevant. Using massively parallel reporter assays (MPRAs) in a panel of patient-derived melanoma lines (MM lines), we set to identify and decipher melanoma enhancers by first focusing on regions with state-specific H3K27 acetylation close to differentially expressed genes. An in-depth evaluation of those regions was then pursued by investigating the activity of overlapping ATAC-seq peaks along with a full tiling of the acetylated regions with 190 bp sequences. Activity was observed in more than 60% of the selected regions, and we were able to precisely locate the active enhancers within ATAC-seq peaks. Comparison of sequence content with activity, using the deep learning model DeepMEL2, revealed that AP-1 alone is responsible for the MES enhancer activity. In contrast, SOX10 and MITF both influence MEL enhancer function with SOX10 being required to achieve high levels of activity. Overall, our MPRAs shed light on the relationship between long and short sequences in terms of their sequence content, enhancer activity, and specificity across melanoma cell states.


Introduction
Enhancers are crucial regulatory regions in the genome that control cell type-specific gene expression. Identifying enhancers helps to better understand cell identity and is key to develop therapies targeting a singular relevant cell type in a disease. To date, accurate prediction of enhancer location and celltype activity remains a challenge. Both the presence and clustering of transcription factor binding sites (TFBSs) are good predictors of enhancer activity (Gasperini et al., 2020;King et al., 2020). Yet, such an approach requires prior knowledge of the cis-regulatory grammar in the studied cell types as only a small proportion of the TFBSs found in the genome are bound by the corresponding transcription factor (TF) (Yáñez-Cuna et al., 2012). Another strategy to identify candidate enhancers is to use active enhancer marks such as H3K27ac and chromatin accessibility (Gray et al., 2017;Minnoye et al., 2021;Rada-Iglesias et al., 2011). The most successful studies, combining this approach with transcriptome data, generated libraries with up to 60% of active enhancers in the target cell type (Gorkin et al., 2020;Graybuck et al., 2021). Massively parallel reporter assays (MPRAs) have been developed to screen the activity of thousands of sequences simultaneously (Arnold et al., 2013;Inoue and Ahituv, 2015;Melnikov et al., 2012;White et al., 2013). However, limitations of sequence synthesis constrain one to choose either a large number of short sequences (e.g., thousands of sequences of 150-250 bp) or a small number of longer sequences (e.g., dozens of sequences of 500-1000 bp) (Inoue and Ahituv, 2015). This issue, combined with the difficulty of identifying putative enhancers, leads to a low rate of active enhancers in MPRAs.
Here, we study enhancer location, specificity, and regulatory grammar in melanoma using a variety of MPRA strategies. Melanoma exhibits pronounced heterogeneity within and between patients (Grzywa et al., 2017). Two main subtypes or cell states are discernible, the melanocytic (MEL) and the mesenchymal-like (MES) state (Hoek et al., 2008;Hoek et al., 2006;Verfaillie et al., 2015), as well as more recently identified variants of the MEL state, such as the neural crest-like and intermediate states (Rambow et al., 2018;Tsoi et al., 2018;Wouters et al., 2020). The MEL and MES subtypes display distinct epigenomic and transcriptomic profiles, resulting in divergent phenotypes such as migration and therapy response Verfaillie et al., 2015). Thus, the identification of subtype-specific enhancers may be relevant for therapy, where it could improve safety and efficiency by narrowing down the effect of the treatment to a specific population. Comparisons between MEL and MES yield thousands of regions with differential acetylation (H3K27ac) and accessibility. However, it is unclear which of these subtype-specific regions function as active enhancers and which TFs are responsible for their activity.
In this study, we analyzed regions with MEL-and MES-specific H3K27ac chromatin marks proximal to differentially expressed genes. We designed MPRA experiments to test those regions at three different levels in a panel of patient-derived malignant melanoma (MM) lines. Our results precisely locate the origins of enhancer activity within the larger H3K27ac domains. In addition, we can accurately predict their subtype specificity and ultimately identify a set of rules governing MEL and MES enhancer activity. Furthermore, we show that a melanoma deep learning model (DeepMEL2; Atak et al., 2021) trained on ATAC-seq data pinpoints which TFBSs drive enhancer activity and specificity.

Results
Design of MPRA libraries based on H3K27ac, ATAC-seq, and synthetic sequences H3K27ac ChIP-seq peaks are often used for the selection of candidate enhancers (Creyghton et al., 2010;Fox et al., 2020;Fu et al., 2018). However, such peaks can encompass large genomic regions, often 2-3 kb long, while enhancers are usually only a few hundred base pairs in size (Gasperini et al., 2020;Li and Wunderlich, 2017). To investigate the relationship between H3K27ac signal, chromatin accessibility peaks, and enhancer activity, we designed MPRA libraries at three different levels: 1.2-2.9-kb-sized H3K27ac ChIP-seq peaks, 501-bp-sized ATAC-seq peaks that fall within the H3K27ac regions, and 190-bp subsequences tiling the entire H3K27ac regions.
We designed the H3K27ac ChIP-seq-based library (H3K27ac library) by selecting regions that are specifically acetylated in either the MEL or the MES melanoma cell state and located around differentially expressed genes in a panel of 12 melanoma lines: 3 MES lines (MM029, MM047, and MM099) and 9 MEL lines that cover a spectrum from pure melanocytic to intermediate melanoma (MM001, MM011, MM031, MM034, MM057, MM074, MM087, MM118, SKMEL5) (see Materials and methods, Figure 1-figure supplement 1a, Figure 1a; Minnoye et al., 2020). A special consideration was given to regions overlapping ChIP-seq peaks for SOX10 and MITF (for MEL regions) or AP-1 (JUN and JUNB; for MES regions), known regulators of each state . A total of 35 MESand 18 MEL-specific regions, with an average size of 1987 bp, were amplified from genomic DNA (Supplementary file 1). Regions were named 'GENE_(-)000X,' where GENE is the associated target gene, (-)000 is the distance to the TSS, and X denotes whether the enhancer is distal, intronic, or at a promoter position (see Materials and methods). The H3K27ac ChIP-seq signal across these selected a.   . ATAC-seq data from the same lines were used to identify accessibility peaks within these regions. Finally, the regions were tiled with 190 bp tiles with a shift of 20 bp (sublibrary A). Sublibrary B was generated by shifting all tiles 10 bp downstream. Cluster-Buster (CBust) was used to regions displays a good correlation between cell lines of the same subtype and a negative correlation between cell lines of a different subtype (Figure 1-figure supplement 1b). This correlation is also observed in the ATAC-seq signal and the target gene expression (Figure 1-figure supplement 1c, d, and h). We created two vector libraries based on the CHEQ-seq vector backbone (Verfaillie et al., 2016) by cloning the sequences upstream (5′ position) or downstream (intron position) of a minimal promoter (SCP1, see Materials and methods, Figure 1b). Next, we designed a second library consisting of the ATAC-seq peaks contained within the H3K27ac library regions (Figure 1a). We used only the H3K27ac peaks that are identified as active in the H3K27ac library or that are assigned as MEL-or MES-specific regulatory regions (respectively represented by regions belonging to topic 4 and topic 7) in our previously published cisTopic analysis of ATAC-seq data from 16 human melanoma cell lines . Each sequence was defined by taking the summit of the ATAC-seq peak and extending 250 bp on each side, resulting in a 501 bp candidate sequence. In total, 28 MES-and 18 MEL-specific ATAC-seq peaks were selected from 37 of the 53 H3K27ac regions (for nine regions, two ATAC-seq peaks were selected). Over the 501 bp of those regions, the correlation of the H3K27ac ChIP-seq, ATAC-seq, and RNA-seq signals between the different cell lines remains the same as for the H3K27ac library (Figure 1-figure supplement 1e-g). We cloned the regions in the CHEQ-seq vector, upstream of the SCP1 promoter ( Figure 1b). In addition, we cloned the same set of sequences in the STARR-seq vector (an alternative MPRA vector) to assess assay-related variability (Muerdter et al., 2018).
Finally, to locate the precise origin of enhancer activity, we generated two tiling libraries (A and B, see Materials and methods), encompassing the entire H3K27ac regions. The tiles are 190 bp long with a 20 bp shift between consecutive tiles. The tiling for library A starts at nucleotide position 1 of the H3K27ac ChIP-seq regions while library B starts at position 11, resulting in a final tiling resolution of 10 bp when both sublibraries are taken into account.
We also sought to probe the effect of mutations within putative TFBSs in the selected H3K27ac regions. We used Cluster-Buster (CBust;Frith et al., 2003) with position weight matrices (PWMs) for binding sites of key regulators of MEL (SOX10, MITF, TFAP2A) and MES (AP-1 and TEAD)  to identify candidate TFBSs and generated tiles carrying mutated versions of these motifs (see Materials and methods). For each sublibrary, 800 shuffled tiles were generated as negative controls, resulting in a total of 7412 and 7356 tiles for sublibraries A and B, respectively ( Figure 1a). Each sublibrary is separately cloned upstream of the SCP1 promoter in the CHEQ-seq vector and is transfected individually (Figure 1b, lower-right panel).

Most MEL-specific acetylated regions harbor enhancer activity in MEL lines
We first transfected all MPRA libraries in the most melanocytic line, MM001 Wouters et al., 2020). Of the MEL-specific H3K27ac regions, 75% (14/18) display significant enhancer activity (Benjamini-Hochberg adjusted p-values < 0.05, see Materials and methods) in MM001 with a mean log2 fold change (FC) of 0.23 compared to 26% (8/32) for the MES-specific regions with a mean log2 FC of -1.21 (Figure 2a). The activities are consistent across the two library designs (enhancers cloned into the intron or upstream of the TSS; Figure 2-figure supplement 1a and b). The ATACbased library recapitulates these activities, suggesting that the enhancer activity is contained within the sequence of the ATAC-seq peak (Figure 2b, Figure 2-figure supplement 1c-e). Interestingly, in the five MEL-specific H3K27ac regions where two ATAC-seq peaks were assessed, only one of the two recapitulates the activity of the encompassing region ( Figure 2-figure supplement 2). This was independently confirmed using the STARR-seq MPRA (Figure 2-figure supplement 1c).
identify transcription factor (TF) motifs, and new tiles were generated with mutated motifs. (b) Reporter vector configurations used for the evaluation of the H3K27ac enhancers (left panel), the ATAC-seq enhancers (top-right panel), and the enhancer tiling (bottom-right panel). SCP1, Super Core Promoter 1; BC, barcode.
The online version of this article includes the following figure supplement(s) for figure 1:  f. The online version of this article includes the following source data and figure supplement(s) for figure 2:

Figure 2 continued on next page
Next, we examined the activity of all 190 bp sequences tiled along the entire H3K27ac regions. We confirmed that the majority of active tiles (92.2%) are located within an ATAC-seq peak (Figure 2c and d, Figure 2-figure supplement 2) and identified active tiles in 7 out of the 10 most active MEL ATAC-seq-based enhancers. Short 190 bp regions can thus often recapitulate the enhancer activity of the larger encompassing region. When two or more consecutive tiles are active, the enhancer may be contained in an even smaller sequence or the activity is coming from independently active elements close to one another.
We recently trained a deep learning model on cis-regulatory topics (sets of co-accessible genomic regions clustered by cisTopic; Bravo  from 30 melanoma lines, called DeepMEL2, that accurately predicts the accessibility and activity of a sequence in the different melanoma subtypes . Each topic used to train the model regrouped accessible regions found in one cell line, in a specific subtype, or in all cell lines. Two topics are associated with the MEL subtype, topics 16 and 17, mostly focused on SOX and MITF motifs, respectively. We scored our 190 bp tiles with DeepMEL2 and found high MEL prediction scores (>0.10, see Materials and methods) specifically within ATAC-seq peaks (Figure 2d). Of the top DeepMEL2 predictions for MEL specific topics, 11% (topic 16) and 17.3% (topic 17) are active tiles in the MPRA (with 0.25/0.375 recall and 0.11/0.173 precision for topics 16 and 17, respectively). These low precision values may be explained by the fact that the DeepMEL2 model was trained on ATAC-seq data, thus yielding high prediction scores within ATAC-seq peaks, yet not all of these show positive MPRA activity.
In some cases, we identified multiple acetylated regions near one gene. For the tyrosinase (TYR) gene, expressed specifically in MEL lines ( Figure 1-figure supplement 1h), three regions were selected as MEL-specific and tested at the acetylation, accessibility, and tiling levels ( Figure 2c). TYR_-9-D and TYR_-1-D regions display high reporter activity, which is subsequently found in the selected ATAC-seq peak. Activity is further narrowed down to tiles corresponding to the DeepMEL2 predictions ( Figure 2c). The TYR_P region, at the gene's promoter site, has a low activity that is also not found when tiling the region, despite the DeepMEL2 predictions. Those findings suggest that TYR expression in MEL lines is largely dependent on the activity of distal enhancers.
Some other enhancers that are active in the H3K27ac and ATAC-seq libraries are not recapitulated in the tile library (e.g., SOX10_-34-D; Figure 2-figure supplement 2c). This can be due to technical reasons, such as the small size of the tiles. Nevertheless, from the combination of the ATAC-seq and enhancer tiling MPRAs, we can conclude that not all subtype-specific ATAC-seq peaks function as a stand-alone enhancer.
We finally compared signals of H3K27ac ChIP-seq mean score, ATAC-seq mean score, and their corresponding MPRA signals (Figure 2e and f). With the H3K27ac library (Figure 2e), the acetylation signal measured over the selected regions correlates well with the accessibility signal (Pearson's correlation r = 0.77), indicating that ATAC-seq peaks are found within the selected acetylated regions and maintain the same differential signal. Active enhancers are found in the majority (14/18) of the MEL-specific acetylation regions with ATAC-seq signal. This trend is also visible in the ATAC-seq-based library, with most of the active enhancers detected in ATAC-seq peaks (13/19) ( Figure 2f). However, the moderate correlation between ATAC-seq signal and CHEQ-seq activity in the corresponding library (Spearman's rho = 0.48) also indicates that the peak mean signal is not a good predictor of the activity level of an enhancer. In part, this may be due to confounding of ATAC and H3K27ac read depth by genomic copy number aberrations. Also, the activity displayed by some MES-specific regions lacking ATAC-seq signal in MM001 suggests that closed regions in the genome can still harbor activity in an episomal MPRA.
Source data 5. DeepMEL2 prediction scores for tiling library B.

Figure 2 continued
In conclusion, our enhancer selection resulted in a high rate of active enhancers in MM001 and the design of our MPRA libraries allowed us to precisely pinpoint the origins of, at least part of, the enhancer activity.

MES-specific H3K27ac/ATAC regions are active in MES lines
Next, we transfected all libraries in the MES line MM029. The activity profiles in this line show that, as expected, the majority of MES enhancers display activity at both the H3K27ac and ATAC-seq level (Figure 3a and b, Figure 3-figure supplement 1a-e). In regions with two selected ATAC-seq peaks, both MPRA approaches agree on which peak is driving activity (Figure 3b and d, Figure 3-figure supplement 1c, Figure 3-figure supplement 2).
Multiple regions around the Collagen Type V Alpha 1 Chain (COL5A1) gene, up to 100 kb upstream of the TSS, were found to be specifically acetylated in MES lines, and we included a total of seven into the library (Figure 3c.). Four regions showed significant activity in MM029 (Figure 3c). This activity is further confirmed in the ATAC-seq and tiling libraries: the ATAC-seq peaks within the four active H3K27ac regions are all active, while the ATAC-seq peaks within the three negative H3K27ac regions are all negative. In the COL5A1_-17-D region, tiling suggests the presence of three distinct enhancers, including two that are located within small ATAC-seq peaks that were not selected for the ATAC-based library.
DeepMEL2 is trained on both MEL-and MES-accessible regions, and topic 19 has been shown to be the best-performing MES topic . In our data, high topic 19 predictions often overlap with active tiles in MM029 (0.181 precision, 0.537 recall, Figure 3c and d). Because several other cis-regulatory topics contribute to the prediction of the MES subtype, some tiles do not display a high prediction score for topic 19 despite their activity and are better explained by other topics.
The small shift between each tile and the use of two overlapping libraries provide a high number of measurements throughout the regions, which allows for accurate detection of weakly active enhancers. Such enhancers are found in the SERPINE1_-110-I region, where the SERPINE1_-110-IA ATAC-seq peak is inactive in the CHEQ-seq and STARR-seq assays (Figure 3d, Figure 3-figure supplement 1c), but the tiling assay shows robust enhancer activity. Of the two ATAC-seq peaks in the FOSL2_-117-I region, neither one recapitulates the activity of the acetylated region. In contrast, the tiling assay reveals clear enhancer activity in both peaks ( Figure 3-figure supplement 2b).
Similar conclusions to those above for MEL enhancers can now be drawn for MES regions regarding the relationship between H3K27ac, ATAC-seq signal, and enhancer activity (Figure 3e and f). ATACbased and tiling CHEQ-seq assays show that most active enhancers in the H3K27Ac regions reside within ATAC-seq peaks (151/164 active tiles are found in ATAC-seq peaks; 92.1%). Irrespective, ATAC-seq peak mean signal remains a poor predictor of the level of enhancer activity, at least as read out by CHEQ-seq. Moreover, the presence of a differentially accessible ATAC-seq peak does not guarantee enhancer function. Indeed, of 26 differentially accessible peaks, only 14 show demonstrable activity (54%).
In conclusion, MPRAs performed in MM001 and MM029 have shown a high success rate of selected regions to display activity specifically in their corresponding cell state. However, MM001 and MM029 lie at the extremes of the MEL-MES spectrum. To further investigate how the activity of these regions scales along the MEL-MES axis, we studied them in five additional melanoma cell lines, representing more intermediate or transitory melanoma states (Tsoi et al., 2018;Wouters et al., 2020).

MES enhancers show lower but consistent activity in intermediate lines
We expanded our panel of cell lines to include two additional MES lines (MM047 and MM099) and three MEL-intermediate lines (MM057, MM074, and MM087). These three lines have high SOX10 and MITF expression (hallmarks of the MEL subtype), yet also show both marker expression and phenotypic characteristics typical for the MES subtype (e.g., AXL expression, TGFb1 signaling activity) (Tsoi et al., 2018). Furthermore, in contrast to MM001, when SOX10 expression is lost, these lines shift toward a MES subtype, with expression of AP-1 and increased chromatin accessibility in MES-specific regions . MM057, MM074, and MM087 were part of the cell lines used for the selection of MEL-specific H3K27ac ChIP-seq regions. As such, they display an acetylation and accessibility profile as well as a transcriptional activity of the associated genes similar to MM001 ( H3K27ac enhancers ATAC-seq enhancers f.  In summary, MEL enhancers display a consistent and specific activity in MEL lines while MES enhancers are not only active in MES lines but also in intermediate lines and to a lesser extent in MM001. This behavior suggests that enhancer activity is driven by the expression of TFs responsible for the MEL and MES subtypes. SOX10, MITF, and TFAP have been identified as drivers of the MEL state and AP-1 and TEAD of the MES state . MM029 expresses MITF at a low level, but this does not result in MEL enhancer activity, suggesting that MITF (or its isoforms in MM029) alone is not sufficient.

Key transcription factor binding sites explain melanoma enhancer activity
To better understand which TFs are important for MEL enhancer activity, we included tiles with MITF, SOX, and TFAP binding site mutations in our tiling libraries. SOX10 and MITF both represent Dark gray areas are regions not covered by the tiling library. CHEQ-seq activity is visible in the 'H3K27ac enhancers' and 'ATAC-seq enhancers' tracks. Benjamini-Hochberg adjusted p-values: *<0.05; ***<0.001. Dashed boxes: regions not recovered following DNA synthesis, cloning, or massively parallel reporter assay (MPRA). Heatmaps of H3K27ac library (e) and ATAC-seq library (f), displaying H3K27ac ChIP-seq signal, ATAC-seq signal, and enhancer activity in MM001 ordered by MPRA values. Only MPRA values of significantly active enhancers are displayed.
The online version of this article includes the following source data and figure supplement(s) for figure 3: Source data 1. CHEQ-seq tiling library A activity.
Source data 2. DeepMEL2 prediction scores for tiling library A.   established lineage TFs in melanocytes and in melanoma, and TFAP2 has been found to co-occur with MITF in active regulatory elements in melanocytes (Goding, 2000;Harris et al., 2010;Seberg et al., 2017). SOX10 was previously shown to be necessary for melanoma formation and maintenance by controlling cell survival and cell cycle (Cronin et al., 2013;Shakhova et al., 2012). MITF is considered to be a melanoma oncogene (Garraway and Sellers, 2006) and has been implicated in various cellular processes (Goding and Arnheiter, 2019;Levy et al., 2006).
The comparison of wild-type versus mutant tiles shows that TFAP does not affect the activity of the profiled enhancers in any of the MEL lines ( Figure 5-figure supplement 1). On the other hand, loss of SOX or MITF significantly affects enhancer activity. KIT_114-D provides a good example of both MITF and SOX motif contributions in MM001 (Figure 5a). The tiling library confirms the location of enhancer activity to be in the KIT_114DA region, where both motifs are found to influence the activity. Mutation of the SOX binding site abolishes tile activity. The same is observed when the MITF binding site is mutated, with the notable exception of one tile also containing the SOX motif

SOX10-dependent ATAC-seq peaks with enhancer activity are enriched in MITF motifs
To study MEL enhancers in more detail, we designed a new library. A SOX10 knockdown (KD) with siRNA was performed on MM057 and MM087, shifting these lines to a MES phenotype, and was followed by ATAC-seq after 0, 24, 48, or 72 hr . After cisTopic analysis, we selected 1461 ATAC-seq peaks where accessibility is lost upon KD and tiled them with 190-bp-long sequences and 120 bp shifts, resulting in 6696 individual sequences ( Figure 5-figure  supplement 4a-d). We cloned this library in the CHEQ-seq vector and transfected it in MM087. Analysis of tile activity revealed that 15.1% of the selected ATAC-seq peaks exhibit enhancer function in MM087 ( Figure 5-figure supplement 4c and d). We performed Gene Set Enrichment Analysis (GSEA) for SOX10 and MITF ChIP-seq in 501mel melanoma line and TFAP2A in human primary melanocytes on the tiles ranked according to their activity (Figure 5d). Only the MITF ChIP-seq signal was enriched in active tiles, indicating that SOX10-dependent regions containing MITF sites are preferentially active. As before, these observations were confirmed by differential motif discovery of active

Synthetic SOX-MITF motif combinations highlight MEL enhancer regulatory grammar
To further investigate a possible interaction between SOX10 and MITF and see if a MEL enhancer can be generated with only those two TFBSs, we designed sequences consisting of combinations of SOX dimer and MITF motifs spread over a 259 bp background sequence (Figure 5e, top panel). Twentyfour different combinations were generated in two background sequences, cloned in the CHEQ-seq vector and transfected in MM001, MM074, and MM087. The activity of the enhancer progressively increases with the number of SOX motifs (Figure 5e, middle panel). The presence of MITF motifs additionally increases enhancer activity, but this differs from the effect of the SOX motifs (Figure 5e, bottom panel). We see a progressive increase of the activity based on the number of MITF binding sites only when there is also a SOX dimer motif present in the sequence. In the absence of a SOX motif, enhancer activity remains low in comparison with sequences containing at least one SOX motif, even when as many as six MITF motifs are present. The presence of multiple SOX motifs greatly increases the enhancer activity, reducing the influence of MITF (Figure 5e, bottom panel). DeepMEL2 predictions on the synthetic sequences confirm these observations ( Figure 5-figure supplement 5a and b): topic 16 predictions show the same constant increase in score with the number of SOX binding sites. Topic 17 predictions also increase based on the number of MITF motifs when one SOX motif is present and are brought down to background level when SOX is absent. These results highlight a cis-regulatory grammar in MEL enhancers involving SOX10 and MITF. As previously described in enhancers regulating pluripotency in mouse embryonic stem cells, regardless of TFBSs positioning, the number of binding sites remains the predominant factor determining activity ( Figure 5-figure supplement 5c; King et al., 2020). The COL5A1_-17-D region illustrates the influence of AP-1 (Figure 6a). Within this region, three enhancers are found active in all cell lines bar MM001, which shows activity only for the second enhancer. The enhancer activity of the H3K27ac and ATAC-based regions across the tested MM lines agrees with the tile activity where MM001 has no or low activity and intermediate and MES lines have a strong activity (Figure 6a and b). Our previous characterization of the melanoma cell states highlighted AP-1 activity in MEL-intermediate cell lines, which could explain the observed activity of some MES enhancers in MM057, MM074, and MM087 .

AP-1 sites can explain MES enhancer activity
Focusing on the first active region of COL5A1_-17-D, enhancer activity is limited to a subset of the tiles containing an AP-1 motif (Figure 6c). Importantly, the observed activity of a tile agrees with its DeepMEL2 prediction score. The generation of DeepExplainer profiles from those predictions reveals the presence of a ZEB repressor motif next to the first AP-1 site (Figure 6d; Postigo and Dean, 1999). In line with this, enhancer activity is absent from all tiles containing this motif (Figure 6c and d). The comparison of WT versus AP-1 mutant tiles specifically in that region further confirms the activator function of AP-1 in MEL-intermediate and MES lines (Figure 6c-e). Motif enrichment analysis on both the active and most accessible tiles in MES lines found only AP-1 enriched (Figure 6f). AP-1 (JUN and JUNB) ChIP-seq signal is furthermore enriched in H3K27ac regions displaying activity, as well as in MES differentially acetylated regions from Verfaillie et al., 2015 (t-test p-value < 10 -16 , Figure 6figure supplement 2). As for MITF and SOX motifs, we validated the function of AP-1 motifs in three active MES-specific ATAC regions in MM099 via luciferase assay ( Figure 6-figure supplement 3a). In all three MES ATAC regions, the loss of AP-1 sites results in a reduction of the luciferase activity ( Figure 6-figure supplement 3b).
The gene regulatory network of the MES subtype was previously found to be mainly controlled by AP-1 and TEAD . Our MPRA results show that only AP-1 has a direct effect

Discussion
We have investigated enhancer activity of regions specific to the two main subtypes of melanoma across a panel of patient-derived MM lines. By studying candidate enhancers with different sequence lengths and interpreting the results in conjunction with deep learning predictions, we obtained a better understanding of the regulatory grammar, function, and specificity of melanoma enhancers.
The first studies that used MPRAs were based on very short sequences (84-145 nt) and identified a modular and motif-centric definition of enhancer elements (Kheradpour et al., 2013;Melnikov et al., 2012;White et al., 2013). The use of captured genomic regions allowed for longer (hundreds of bp) and more (several millions) sequences to be tested in parallel (Arnold et al., 2013;Verfaillie et al., 2016;Wang et al., 2018). While providing a genome-wide view on enhancer activity, the cloned fragments are highly variable in size and do not readily allow the study of corresponding mutant sequences. Furthermore, most studies using MPRAs have been limited to one cell line or two very different cell lines (e.g., K562 and HepG2;or S2 and OSC in Drosophila;Arnold et al., 2013;Ernst et al., 2016), leaving the evaluation of enhancer specificity unassessed or reduced to individual examples. Here, we perform MPRAs in a panel of seven cell lines from the same disease, allowing us to validate the cell state specificity of the selected regions.
Our approach of selecting putative enhancers based on specific H3K27ac ChIP-seq peaks, located close to differentially expressed genes, resulted in a high success rate (60-75%), comparable to other studies leveraging a similar approach (Gorkin et al., 2020;Graybuck et al., 2021). In contrast, our large-scale library based solely on ATAC-seq peaks that decrease after SOX10 KD contained only 15.1% active enhancers ( Figure 5-figure supplement 4). This indicates that success rates of enhancer activity vary widely. Whereas most of the differential H3K27ac peaks near marker genes are indeed enhancers, many ATAC-seq peaks scattered across the genome do not act as enhancers when tested in reporter assays, even if they were affected by TF KD. We also find active enhancers in closed chromatin regions, a phenomenon previously described in whole-genome STARR-seq studies in human and Drosophila (Arnold et al., 2013;Liu et al., 2017a). Those closed regions can often be found accessible in other cell lines of our panel, and their ectopic activity suggests that the chromatin state in the genome differs from the episomal plasmid, keeping the enhancer inactive in the genome despite what seems to be a sufficient TF expression to trigger their activity in our episomal assay. Conversely, some enhancers may rely on 3D chromatin structure and the formation of enhancer-promoter looping to activate transcription via long-range interaction (Liu et al., 2017b). This mechanism would require the native chromatin context that cannot be readily recapitulated by our reporter assay.
The influence of the size of the candidate enhancer has previously been assessed, although to a limited extent, and only by choosing sizes that are not representative of chromatin structures (Klein et al., 2020). Here, we tested long sequences corresponding to entire H3K27ac regions up to 2.9 kb, 501 bp ATAC-seq peak regions located within the larger H3K27ac domains, and finally, short 190 bp sequences tiling acetylated regions. This revealed that the actual enhancer within a H3K27ac region is usually located within an accessible subregion. Furthermore, the activity of that ATAC-seq peak can often be recapitulated by a sequence of 190 bp in length. Previous studies have used enhancer tiling to panel (a). Gray areas in 'DeepMEL2' and 'CHEQ-seq tiling' tracks correspond to regions not covered by the tiling library. In the 'Enhancer tiling' tracks, red and green tiles are the mutated and WT tiles, respectively.    assess the activity of accessible chromatin regions and highlighted the importance of cell type-specific binding sites (Ernst et al., 2016;Wang et al., 2018). Nevertheless, such a short sequence does not always capture all the regulatory information found within the accessible region. Our detailed description of KIT_114DA and COL5A1_-17-D ATAC-seq peaks highlights the presence of both activator and repressor elements spread over distances larger than 190 bp that cannot be jointly evaluated with our tiled MPRA. This finding highlights the necessity of using sequences ~ 500 bp long to more confidently evaluate enhancers.
Many genes are regulated by an 'array' of enhancers thought to cooperate to regulate target gene expression by forming a chromatin hub (Di Giammartino et al., 2020;Gorkin et al., 2014;Zhu et al., 2021). For several loci in this study, we selected multiple regions near the same target gene. This experiment indicated that not all regions with a similar acetylation and ATAC-seq profile harbor enhancer activity in the reporter assay. As previously described by genomic editing of enhancers displaying similar chromatin features, enhancer cooperativity can be required to activate transcription (Hnisz et al., 2015;Huang et al., 2016). This makes the identification of relevant TFBSs the most informative method to find active enhancers. In that regard, the use of deep learning models like DeepMEL2 , even when they are trained only on ATAC-seq data, provides a powerful tool to identify key TFs and evaluate enhancer activity.
Intriguingly, we find a gradient of MES enhancer activity across our panel of cell lines, with MM001, intermediate, and MES lines having low, medium, and high activity, respectively. These enhancer profiles do not agree with their acetylation, ATAC-seq, and transcriptome profiles, which all show a strong intra-subtype homogeneity. This discrepancy was more pronounced with short 190 bp sequences.
Our results suggest that AP-1 is responsible for the activity of MES enhancers, agreeing with its predominance in allele-specific chromatin accessibility variants in melanoma . Note that AP-1 is also typically activated during the stress response (Hess et al., 2004). This may contribute to the limited activity of MES regions in MM001 following electroporation. Nevertheless, AP-1 enhancers show much higher activity in intermediate lines, which cannot be explained by a stress response alone. The observed activity of AP-1 enhancers in intermediate lines, even though they are neither accessible nor acetylated in these lines, could be due to the episomal nature of our reporter assay, whereby a (limited) AP-1 activity, as observed previously , may be sufficient to activate enhancers on a plasmid, but not in the endogenous locus. The use of a genome integrated reporter assay could help determine if the accessibility of the sequence is required for MES enhancer activity in MEL lines.
The analysis of MEL regions revealed that both SOX and MITF binding sites independently contribute to enhancer activity. TFAP binding sites did not show any influence on enhancer activity in the tested H3K27ac regions and showed only a small enrichment in SOX10-dependent ATAC-seq peaks displaying activity (NES: 1.19; pAdj: 0.046). These results contradict previous observations of TFAP2 and MITF co-occurrence in active regulatory elements in melanocytes and of TFAP2 pioneer function during cell fate commitment in the neural crest (Rothstein and Simoes-Costa, 2020;Seberg et al., 2017). These differences may be explained at least in part by the limited number of regions selected in our H3K27ac library or by a possible reduced influence of TFAP2 in the melanoma lines used in this study. Interestingly, we identify an interaction between SOX and MITF where the presence of at least one SOX dimer motif greatly increases the activity of enhancers containing MITF motifs. It is not clear if this interaction is due to the direct cooperation of SOX10 and MITF or the pioneer factor function of SOX family members (Julian et al., 2017). Cooperativity between these two TFs could be governed by strict rules requiring the presence of a SOX motif for activity or could be a more 'loose' requirement for heterotypic clusters of TFBSs (i.e. containing binding sites for multiple TFs) to maintain a high level of activity. Indeed, studies have shown that, in some cases, homotypic clusters of TFBSs yield lower expression than heterotypic clusters of TFBSs (Fiore and Cohen, 2016;Levo and Segal, 2014;White et al., 2016). While our exploration of synthetic SOX/MITF-based enhancers did not include sequences with fewer than six MITF and no SOX motifs, our enhancer tiling MPRA and luciferase assay did reveal such sequences to be active (e.g., KIT_114-D and SOX10_-35-D; Figure 2-figure supplement 2b and c; MLANA_5-I; Figure 5-figure supplement 3b). This suggests that a SOX motif is not required in enhancers with few MITF motifs. Alternatively, the pioneer function of SOX would suggest that the sequences cloned in the reporter vector are at least partially chromatinized. Riu et al. previously presented evidence supporting plasmid chromatinization, more precisely in regions of bacterial origin, resulting in silencing of a human expression cassette (Riu et al., 2007). However, it is still unclear if non-coding human regions could also be subject to chromatinization in a plasmid context and what role sequence length might play. We find a less pronounced influence of the SOX motif in short 190 bp compared to 259 bp sequences, and shorter MES enhancers also show more ectopic activation in intermediate lines, which may suggest that short enhancer sequences are being less efficiently chromatinized. Consequently, short enhancers might be characterized by increased accessibility in an extrachromosomal reporter, uncoupling measured activities from those demonstrated at the endogenous locus. Kheradpour et al. proposed a similar hypothesis by suggesting that DNA sequence features contained within the tested elements are partly responsible for establishing the endogenous chromatin state (Kheradpour et al., 2013). These considerations are also relevant in the context of extrachromosomal DNA driving oncogene overexpression in various human malignancies (Verhaak et al., 2019). The use of integrated MPRA or the measurement of chromatin accessibility in the plasmid would help to determine if enhancer sequences are subject to chromatin modification.
With this study, we show that melanoma subtype-specific enhancers can be identified, even down to sizes of 190 bp. This opens up promising avenues to use information gathered from MPRAs to identify small, specific enhancers for enhancer and gene therapies. Our assay combining SOX and MITF motifs showed that new MEL enhancers can be designed by addition of the corresponding TFBSs to a random background sequence and that the level of activity can be precisely controlled by varying the number of TFBSs. Using genomic regions confirmed as MEL enhancers and adjusting the motif sequences and numbers could further improve enhancer design.

MPRA design and cloning
H3K27ac-based library The selection of H3K27ac ChIP-seq regions harboring potential state-specific enhancer activity was done as follows (Figure 1-figure supplement 1a). Differentially enriched H3K27ac ChIP-seq peaks between MEL (6669) and MES (13,453) state, identified in our previous study , were used as input in i-cisTarget (Imrichová et al., 2015) to identify enriched TFBSs in specifically acetylated regions. For each melanoma subtype, the target genes of the top three enriched TFBSs were extracted and used to filter the list of differentially expressed genes identified in each subtype . Those genes were ordered by differential expression in MES vs. MEL lines, and the TSS of the top and bottom genes was used as the starting point to manually search for H3K27ac regions specific for MES and MEL, respectively. ChIP-seq tracks are visually investigated using the UCSC Genome Browser for regions displaying acetylation peaks only in one state close to those genes. Identified regions, within ~150 kb of the TSS, are associated with the differentially expressed gene, even if some of them are closer to or within another gene. Candidate regions were preferentially selected if they also overlapped with ChIP-seq peaks for SOX10 or MITF for MEL regions and JUN and JUNB (AP-1) for MES regions. 65 regions with a size <3 kb were manually selected, and primers to amplify them were designed using Primer3Plus within the flanking first 100 bp on each side of the sequence. 15 bp extensions were added to the primers to allow recombination with the region upstream of the SCP1 promoter in the CHEQ-seq vector (Verfaillie et al., 2016) via In-Fusion reaction (primer list is given in Supplementary file 2). Regions were named as follows: GENE_(-)000X with 'GENE' the name of the gene associated to the region, (-)000 the distance (in kb) between the gene TSS and the highest peak summit (H3K27ac) of the region, and 'X' the gene annotation of the region with 'P' for promoter, 'I' for intron, 'E' for exon, 'D' for distal, and '3U' for 3′ UTR. All PCR amplifications performed in this study make use of the KAPA HiFi HotStart ReadyMix (Roche, Basel, Switzerland). The amplification of the selected regions was done from MM074 genomic DNA in a 50 µl reaction. PCR fragments were then purified on a 0.8% agarose gel and the sequence was confirmed via Sanger sequencing. 53 regions could be successfully amplified (Supplementary  file 1). For the generation of the CHEQ-seq 5′ library, the CHEQ-seq vector containing a random 17 bp barcode (BC) upstream of the synthetic intron was linearized by inverse PCR with the primers 'CHEQ-seq_lin_5′_For' and 'CHEQ-seq_lin_5′_Rev,' resulting in a fragment with both ends overlapping with the primers designed to amplify the selected regions. Amplified regions were mixed in equimolar ratio and introduced in the CHEQ-seq vector (Addgene #174669) via In-Fusion reaction (Takara Bio, Kusatsu, Japan) with a vector to insert ratio of 1:2.
For the generation of the CHEQ-seq intron library, amplified regions were reamplified with primers containing adaptors to allow recombination within the intron of the CHEQ-seq vector ('H3K27Ac_ lib_intron_For', 'H3K27Ac_lib_intron_Rev'). The CHEQ-seq vector containing a random 18 bp BC downstream of the synthetic intron was linearized by inverse PCR with the primers 'CHEQ-seq_lin_ Intron_For' and 'CHEQ-seq_lin_Intron_Rev.' Reamplified regions were mixed in equimolar ratio and introduced in the CHEQ-seq vector via In-Fusion reaction with a vector to insert ratio of 1:2. The In-Fusion reactions were dialyzed against water in a 6 cm Petri dish with a membrane filter MF-Millipore 0.05 µm (Merck, Kenilworth, NJ) for 1 hr. Reactions were recovered from the membrane, and 2.5 µl of the reaction was transformed into 25 µl of Lucigen Endura ElectroCompetent Cells (Biosearch Technologies, Hoddesdon, UK). Transformed bacteria were cultured overnight in a shaker before maxiprep.

ATAC-based library
The sequences constituting the ATAC-based library were selected from ATAC-seq peaks from MM001, MM029, MM047, MM057, MM074, MM087, and MM099 overlapping with H3K27ac library regions. Among this subset, only peaks that overlap with regions identified as active in the H3K27ac library or that are assigned as MEL-or MES-specific regulatory regions (respectively represented by regions belonging to topics 4 and 7) in our previously published cisTopic analysis of ATAC-seq data from 16 human melanoma cell lines were retained . The final selection included 46 ATAC-seq peaks (Supplementary file 1). When two ATAC peaks were selected within the same H3K27ac region, an 'A' and a 'B' were added to the region name to distinguish them (e.g., KIT_114DA and KIT_114DB). The summit of each peak was extended by 250 bp on both sides to generate a 501 bp sequence. At the 3′ end of the sequence, a 16 bp spacer sequence was added, followed by an 8 bp BC specific for each selected region. 38 sequences were synthesized by Twist Bioscience (South San Francisco, CA). This includes four sequences where an A or a T was substituted to break a polyA or polyT, respectively, in order to make the sequences compatible with the synthesis process. Primers were designed for the remaining eight sequences to amplify them from genomic DNA ('ATAC-seq-based library' in Supplementary file 2). Only one PCR, that is, for the amplification of the SERPINE1_-27-D sequence, was not successful. The CHEQ-seq vector containing a random 17 bp BC upstream of the synthetic intron and the STARR-seq ORI vector (Addgene #99296; Muerdter et al., 2018) were linearized via inverse PCR ('ATAC-seq-based library' in Supplementary file 2). The individual sequences were pooled together in equimolar ratio, and NEBuilder (New England Biolabs, Ipswich, MA), with a vector to insert ratio of 1:2, was used to introduce them in the vectors. Dialysis and transformation were performed similarly to the H3K27ac library. Before culture for maxiprep, 1:100,000 of the transformed bacteria was plated on an LB-agar dish with carbenicillin to estimate the complexity of the cloned library. A volume of bacteria corresponding to a complexity of 3500 BCs per enhancer was put in culture for maxiprep.

Enhancer tiling libraries
Using a custom script, tiles were generated by selecting 190 bp from the start (library A) or position 11 (library B) of the H3K27ac selected regions and by switching every 20 bp in 3′ direction. Tile generation is stopped when the position of the final nucleotide of the tile is superior to the final nucleotide of the H3K27ac region. To generate mutated tiles, we first scanned all selected H3K27ac regions with CBust (Frith et al., 2003) and the following PWM separately: transfac_pro__M08838 (SOX10 dimer); homer__RTCATGTGAC_MITF; transfac_pro__M01859 (TFAP monomer); tfdimers__MD00038 (TFAP dimer); tfdimers__MD00591 (SOX10-TFAP dimer); hocomoco__JUN_f1; homer__NATGAST-CABNN_Fosl2; cisbp__M5907 (TEAD). Using a custom script, motifs were mutated on their two or four most important nucleotides for monomers and dimers, respectively. In order to keep the number of tiles manageable, only highly conserved motifs were mutated. This resulted in some potentially functional sites (presence of TF ChIP-seq peak) not being recognized by PWM scores. Mutated tiles for each motif were generated separately so that all occurrences of a single motif are present. Shuffled negative control tiles were generated by shuffling all WT and mutated sequences with uShuffle (Jiang et al., 2008). Sequences containing a stretch of the same nucleotide for six or more nucleotides were filtered out. The remaining tiles were scored with CBust using the same PWMs and parameters as before, and 800 tiles containing no motifs were selected at random. In total, libraries A and B contained 7412 and 7393 tiles, respectively (Supplementary file 1).
Adaptor sequences were added to the tiles: 'Adaptor_LibA_5'' and 'Adaptor_LibA_3'' for library A and 'Adaptor_LibB_5'' and 'Adaptor_LibB_3'' for library B. The use of different adaptors for each library will result in the insertion of the library B 20 bp downstream of the library A, providing a slightly different surrounding context that combined with a high number of barcodes per enhancer aims at reducing experimental noise. Final libraries were synthesized via Agilent's Oligonucleotide Library Synthesis Technology (Santa Clara, CA).
Oligonucleotide libraries were resuspended in endotoxin-free TE buffer pH 8 to a final concentration of 20 nM. For each library, 10 PCR reactions were performed with 2 µl of resuspended library for 12 cycles with primers 'Lib_A_amp_For' and 'Lib_A_amp_Rev' or 'Lib_B_amp_For' and 'Lib_B_ amp_Rev.' The PCR product was first cleaned up using MinElute (QIAGEN) with five PCR reactions per column, then pooled together and cleaned up a second time with 1.6× SPRI beads (Beckman Coulter, Brea, CA). The CHEQ-seq vector containing a random 17 bp BC upstream of the synthetic intron was linearized via inverse PCR with primers 'CHEQseq_lin_A_For' and 'CHEQseq_lin_A_Rev' or 'CHEQseq_lin_B_For' and 'CHEQseq_lin_B_Rev.' Amplified libraries and the corresponding linearized vector were combined in an NEBuilder reaction with a vector to insert ratio of 1:3.25. Dialysis, transformation, and maxiprep were performed similarly to the H3K27ac library.

SOX10 knockdown-based library
To define a set of SOX10-dependent MEL enhancers, we used public OmniATAC-seq data during a time series of SOX10 knockdown (0, 24, 48, 72 hr) on two melanoma cells lines (MM057 and MM087) (GSE114557; Bravo . We generated 50 simulated single cells per condition by randomly sampling 50,000 reads per cell. Candidate regulator regions were defined by peak calling with MACS2 (v.2.0.10) in each of the bulk samples and merging the condition-specific peaks using mergeBed (part of BEDtools, v.2.23.0), and blacklisted regions were removed using https:// sites. google. com/ site/ anshulkundaje/ projects/ blacklists (hg19). We ran cisTopic (v0.2.1; Bravo  (parameters: α = 50/T, β = 0.1, burn-in iterations = 500, recording iterations = 1000) for models with a number of topics between 2 and 25. The best model was selected on the basis of the highest log-likelihood, resulting in 16 topics. We binarized the topics using a probability threshold of 0.99 and continued with topic 11, which contained regions that are accessible in baseline conditions but lose accessibility following SOX10 knockdown. A total of 1461 enhancers were chosen to be tested. To increase the resolution of the library, we tiled the sequences to 190 bp using a 120 bp sliding window across the enhancers, resulting in 6696 tiles. In addition, 100 shuffled negative control sequences were generated similarly to the tiling libraries (Supplementary file 1). Vector-specific adapters were added to the sequences, and the tiles were synthesized via Agilent's Oligonucleotide Library Synthesis Technology.
The library was amplified via PCR using primers 'Lib_A_amp_For' and 'Lib_A_amp_Rev' and cloned into the linearized CHEQ-seq plasmid by following the same procedure as for the enhancer tiling library A. Dialysis, transformation, and maxiprep were performed similarly to the H3K27ac library.

Synthetic combinations of SOX and MITF library
Random 259 bp sequences were generated using SMS2-Random DNA Sequence tool (Stothard, 2000). Two sequences, displaying no enrichment in any of the topics defined in our previously published cisTopic analysis , were selected as background sequences. SOX and MITF motifs (ACAAAGACGGCTTTGT and CACGTG, respectively) were inserted in the sequence with a motif in the center and the other motifs placed upstream and downstream with a distance of 25, 50, or 75 bp (Figure 5e, top panel). A complete list of motif combinations can be found in Supplementary file 1. 200 negative control shuffled sequences were generated with uShuffle as described previously. A 11 bp BC specific for each enhancer was placed in 5′ position of the sequence. Barcoded enhancers were finally flanked with the adaptors GAGCATGCACCGGTG and CGCTTCGAGCAGACA in 5′ and 3′, respectively. The final library was synthesized by Twist Bioscience as an Oligo Pool. The oligonucleotide library was resuspended according to the manufacturer's recommendation and amplified via PCR with the primers 'CHEQ_comb_amp_For' and 'CHEQ_comb_amp_Rev.' The CHEQ-seq vector with a 17 bp random BC upstream of the intron was linearized via inverted PCR with the primers 'CHEQ_comb_lin_For' and 'CHEQ_comb_lin_Rev.' NEBuilder was then used to combine the library with the linearized vector with a vector to insert ratio of 1:2. Dialysis and transformation were performed similarly to the H3K27ac library. Before culture for maxiprep, 1:100,000 of the transformed bacteria was plated on an LB-agar dish with carbenicillin to estimate the complexity of the cloned library. A volume of bacteria corresponding to a complexity of 500 BCs per enhancer was put in culture for maxiprep.

Enhancer-barcode assignment CHEQ-seq 5′/intron for H3K27ac ChIP-seq regions
The part of the plasmid extending from the enhancer till the random BC was amplified via PCR with primers 'PacBio_5′_For,' 'PacBio_5′_Rev,' 'PacBio_Intron_For,' and 'PacBio_Intron_Rev' for CHEQ-seq 5′ and CHEQ-seq intron, respectively. Gel extraction was performed to isolate the PCR product with the correct size range using the NucleoSpin Gel and PCR Clean-up kit (Macherey-Nagel, Düren, Germany). PacBio sequencing library preparation for both libraries was done by the Genomics Core Leuven (KU Leuven). Sequencing was done with a PacBio Sequel for long-read sequencing (Pacific Biosciences, Menlo Park, CA) with both libraries sequenced with one SMRT cell. We obtained 66,407 and 105,590 reads with three passes for CHEQ-seq 5′ and CHEQ-seq intron, respectively.
Enhancer-BC assignment was done with a custom script. Briefly, enhancers and random BCs were independently extracted from the reads with Cutadapt (Martin, 2011). Enhancer sequences were mapped with Minimap2  and a custom genome containing all the cloned regions, and only MAPQ ≥ 4 were retained. Mapped enhancers were linked back to random BCs of the vector. Following assignment, 46 (86.8%) and 50 (94.3%) sequences could be identified in the CHEQ-seq 5′ and intron library, respectively. The CHEQ-seq 5′ library displayed an average of 31.9 BCs per enhancer while the intron library displayed an average of 604.5 BCs per enhancer.

CHEQ-seq for ATAC-seq regions
A PCR amplification of the enhancer-specific BC, together with the random BC, was done with the primers 'Enh-BC_ATAC_Stag = X_For' and 'Enh-BC_ATAC_Rev.' Illumina sequencing adaptors were added during a second round of PCR with the primers 'i5_Indexing_For' and 'i7_Indexing_Rev.' After sequencing in NovaSeq600 for 50 cycles in read 1 and 49 cycles in read 2, enhancer BCs and random BCs were extracted from read 1 and read 2, respectively, with Cutadapt before being filtered for quality (Q > 30). Following assignment, 44 (95.7%) sequences could be identified, with an average of 34,815 BCs per enhancer.

CHEQ-seq for SOX-MITF synthetic combinations
This CHEQ-seq library was prepared for sequencing similarly to the CHEQ-seq enhancer tiling A sublibrary. After sequencing in NovaSeq600 for 50 cycles in read 1 and 49 cycles in read 2, enhancer BCs and random BCs were extracted from read 1 and read 2, respectively, with Cutadapt before being filtered for quality (Q > 30). Following assignment, 249 (99.6%) sequences could be identified, with an average of 276 BCs per enhancer.

Cell lines
The cell lines used in this study were patient-derived melanoma lines obtained from the laboratory of Pr. Ghanem-Elias Ghanem (Institut Jules Bordet, ULB, Belgium). All lines have been identified by RNA-seq and ATAC-seq. Mycoplasma test was performed systematically before each experiment that would provide data for this study.

MPRA
The MPRA libraries were electroporated in 4-6 million cells each using the Nucleofector 2b or 4D (Lonza, Basel, Switzerland) with 6 µg of plasmid DNA and program T-030 or Y-001 and DS-132, EH-116, or CM-134, respectively. For the CHEQ-seq 5′ and intron H3K27ac and CHEQ-seq ATAC libraries, one replicate was performed per cell line except for MM087, where three replicates were performed. For STARR-seq ATAC and the CHEQ-seq SOX-MITF combination library, one replicate was performed per cell line. For the CHEQ-seq enhancer tiling libraries, two replicates were performed per cell line except for MM087, where four replicates were performed. For the CHEQ-seq SOX10-KD library, three replicates were performed in MM087. Medium was changed 24 hr after electroporation. 48 hr post-electroporation, cells were detached from the plate using trypsin (Thermo Fisher Scientific). One-fifth of the cells was used for plasmid DNA extraction (QIAGEN, Hilden, Germany). The remaining cells underwent RNA extraction using the innuPREP RNA Mini Kit 2.0 (Analytik Jena, Jena, Germany), followed by mRNA isolation using the Dynabeads mRNA purification kit (Ambion, Austin, TX) and cDNA synthesis using the GoScript RT kit and oligo dT primer (Promega, Madison, WI). For STARR-seq samples, a junction PCR was done for 12 cycles with the primers 'STARR-seq_Junc-tion_cDNA_For' or 'STARR-seq_Junction_plasmid_For' and 'STARR-seq_Junction_Rev' followed by a PCR to amplify the enhancer for four cycles with the primers 'STARR-seq_enhancer_Stag = X_For' and 'STARR-seq_enhancer_Rev.' For CHEQ-seq samples, a PCR was performed to amplify the random BC from the plasmid DNA or cDNA samples for 16 cycles with the primer pairs 'Cheq-seq_barcode_ Intron_round1_Stag = 0_For'/'Cheq-seq_barcode_Intron_round1_Rev' for the CHEQ-seq intron H3K27ac library or 'Cheq-seq_barcode_5'_round1_Stag = X_For'/'Cheq-seq_barcode_5'_round1_ Rev' for all other libraries. To add Illumina sequencing adaptors, all samples were finally amplified by PCR for six cycles with the primers 'i5_Indexing_For' and 'i7_Indexing_Rev.' After confirmation of the fragment size with a Bioanalyzer, samples were sequenced at the Genomics Core Leuven (KU Leuven).

MPRA analysis Read processing and BC identification
Read processing following sequencing was performed with a custom bash script. First, random BCs were extracted using Cutadapt and filtered for a quality (Q-score) > 30. The number of reads per uniquely identified BC was counted and the name of the enhancer was assigned to the BC sequence based on the enhancer-BC assignment list for each library. Unassigned BCs were filtered out to obtain a final data frame containing the name of the enhancer, the BC sequence, and the number of reads.

Estimation of enhancer activity
Enhancer activity from MPRA was estimated via a custom R script (RStudio, R version 3.6.0) (Mauduit, 2021a; https:// github. com/ aertslab/ Melanoma_ MPRA_ paper; Mauduit, 2021b, copy archived at swh:1:rev:a90ee6e58d16fbaf367e4e68dc3a3d6fd3bc9e85). Enhancers were first filtered based on the number of BCs identified in the sequencing reads. Thresholds of 5 (for the H3K37ac libraries and enhancer tiling libraries), 10 (for ATAC libraries), or 20 (for the SOX-MITF combinations library) BCs per enhancer were selected based on the complexity of the library and the sequencing saturation of the enhancer-BC assignment samples. For the remaining enhancers, BC counts were aggregated per enhancer and then a count per million (CPM) normalization was applied. Plasmid (input) and cDNA (output) samples were merged by keeping only enhancers remaining in both samples after filtering. Input normalization was done by dividing CPM normalized cDNA counts by CPM normalized plasmid counts, resulting in an FC value. For libraries with shuffled sequences, a basal expression normalization was further applied by dividing the FC value of the enhancer by the median FC value of the shuffled sequences. The MPRAnalyze method was tried as an alternative to our method and gave nearly identical results in the case of the H3K27ac CHEQ-seq intron library (mean Pearson's correlation on log2 FC r = 0.96; Ashuach et al., 2019). The high computational demand when the number of BCs is high made it inappropriate for the analysis of most libraries. For consistency, all assays were analyzed with our aggregated BC method, which showed more consistency with low complexity libraries and more scalability with very complex libraries. In order to distinguish active and inactive enhancers, a Gaussian fit of the shuffled negative control values was performed with the 'robustbase 0.93-6' package, and a p-value and Benjamini-Hochberg adjusted p-value was calculated based on that Gaussian fit for all enhancers with the 'stats 3.6.0' package. An enhancer is considered active if its adjusted p-value is <0.05. For the H3K27ac and ATAC-seq libraries, which did not contain shuffled sequences, regions containing no active tiles in the enhancer tiling MPRAs and displaying low activity in both H3K27ac and ATAC-seq libraries MPRAs were selected as negative controls and used to fit the Gaussian curve. For the CHEQ-seq SOX10-KD library, DEseq2 (Love et al., 2014) was used for estimating enhancer activity.

Sample exclusion
Despite the high number of identified enhancer-random barcode couples during the assignment step for the CHEQ-seq enhancer tiling libraries A and B, the complexity of those libraries was so high that less than 3% of the barcodes could be identified following MPRA. This resulted in a low enhancer coverage or an insufficient number of remaining reads to identify enhancer activity in many samples. The 'OutlierD 1.48.0' R package was used to identify outliers. Samples that displayed <1% of outliers or had too low coverage (<450 tiles) were excluded.

Motif enrichment and GSEA
Differential motif enrichment between the active tiles and the remaining tiles for the enhancer tiling libraries and the SOX10-KD library was performed via HOMER findMotifs (Heinz et al., 2010). For the enhancer tiling libraries, differential motif enrichment was also performed between the top 1/4th accessible tiles and the remaining tiles. MITF, SOX10, and TFAP2A ChIP-seq peaks Seberg et al., 2017) were intersected with the tiles using BEDtools. A GSEA was performed using the R package 'fgsea' (Korotkevich et al., 2021) by ranking the tiles according to their log2 FC and providing the overlaps of the ChIP-seq peaks with the tiles as gene sets.

Deep learning predictions and nucleotide contribution visualization
Enhancer sequences for the enhancer tiling and SOX-MITF combinations libraries were scored with the DeepMEL2+ GABPA version of DeepMEL2 as described previously . To accommodate for the 500 bp required length of the sequences to be scored by DeepMEL2, the vector sequence flanking the insertion site of the enhancer was added on both sides of the sequence. A threshold of 0.1 was defined to distinguish between low and high prediction score for topics 16, 17, and 19 as it approaches the mean score +2 * standard deviation of those topics.
Visualization of nucleotide contribution to DeepMEL2 prediction score was done with DeepExplainer as described previously Lundberg and Lee, 2017).

Validation of TFBS functionality with luciferase assay
Three MES-specific ATAC regions (ABCC3_11-I, COL5A1_-17-D, GPR39_23-I), three MEL-specific ATAC regions (MLANA_5-I, IRF4_4-I, TYR_-9-D), and two inactive ATAC regions (SOX10_15-3UA, SOX10_-56DA) were selected. For each sequence, TFBSs for AP-1, TEAD, MITF, SOX, and ZEB were manually identified from the DeepExplainer profile and mutated the same way as for the enhancer tiling library. For ZEB binding sites, the CAGGTG site was mutated into CGGATG. A total of 24 sequences were ordered from Twist Bioscience pre-cloned in the pTwist ENTR plasmid (Supplementary file 1). Regions were introduced in the pGL4.23-GW luciferase reporter plasmid (Promega) via Gateway LR recombination reaction (Invitrogen) and transformed into Stellar chemically competent bacteria (Takara). Plasmid minipreps were performed using the NucleoSpin Plasmid Transfectiongrade Mini kit (Macherey-Nagel) and sequenced with Sanger sequencing to confirm the correct insertion of the regions in the destination plasmid.
MM001 and MM099 were seeded in 24-well plates and transfected with 400 ng pGL4.23-Enhancer plasmid + 40 ng pRL-TK renilla plasmid (Promega) with Lipofectamine 2000. One day after transfection, luciferase activity was measured via the Dual-Luciferase Reporter Assay System (Promega) by following the manufacturer's protocol. Briefly, cells were lysed with 100 µl of Passive Lysis Buffer for 15 min at 500 rpm. 20 µl of the lysate were transferred in duplicate in a well of an OptiPlate-96 HB (PerkinElmer, Waltham, MA) and 100 µl of Luciferase Assay Reagent II were added in each well. Luciferase-generated luminescence was measured on a Victor X luminometer (PerkinElmer). 100 µl of the Stop & Glo Reagent was added to each well, and the luminescence was measured again in order to record renilla activity. Luciferase activity was estimated by calculating the ratio luciferase/renilla. This value was further normalized by the ratio calculated on blank wells containing only reagents. Three bedGraphs using as scaling parameter (-scale) 1 × 106/(number of non-mitochondrial mapping reads); or made by bamCoverage (deepTools; v3.3.1) .

Data access
MPRA data generated for this study have been submitted to the NCBI Gene Expression Omnibus (GEO, https://www. ncbi. nlm. nih. gov/ geo/) under accession number GSE180879.