Modeling islet enhancers using deep learning identifies candidate causal variants at loci associated with T2D and glycemic traits

Significance Identifying the genomic and molecular effects of disease-associated genetic variants is a central challenge in translating signals from genetic association studies to insights into the causes of disease. Such effects can be defined by targeted functional studies, but these studies are difficult to scale across the thousands of candidate causal variants routinely identified by genetic association studies. To help solve this problem, we developed a method to predict the effects of genetic variation on enhancers. We apply this method to model pancreatic islet enhancers, demonstrate that the model is accurate, and show that the predicted effects of genetic variants on enhancers can help identify candidate causal variants for targeted functional studies.


Islet genomic and epigenomic profiles
We used previously published ATAC-seq data from 33 islets, consisting of 64,129 peaks (3). As described in Viñuela et al. (3), since these islets originate from multiple studies, reads were downsampled to the minimum read depth across samples (27,994,993 reads) and merged across studies. Peaks were called across all samples (via the merged bed file) as well as within samples. The final 64,129 ATAC-seq peak calls were those peaks from the merged bed file that occurred in >17 of the individual samples ATAC-seq peak calls.
We reprocessed previously published H3K27ac ChIP-seq data from two islets (4). Briefly, in order to avoid bias from one specific sample, we downsampled the number of reads from each sample to the minimum read depth across samples (26,369,910 reads) and merged reads from both samples. We performed a similar process for the input controls for each sample, downsampling to 23,702,108 reads per sample before merging reads across samples. Next, we aligned reads and identified peaks using the ENCODE ChIP-seq processing pipeline (https://github.com/ENCODE-DCC/chip-seq-pipeline2, v1.2.0) with default parameters. We excluded the peaks overlapping Duke blacklisted regions (UCSC browser tables wgEncodeDacMapabilityConsensusExcludable and wgEncodeDacMapabilityConsensusExcludable), resulting in 87,007 H3K27ac peaks.

Enhancer definitions
We identified 9,918 islet enhancers by selecting ATAC-seq peaks that overlapped an islet H3K27ac ChIP-seq peak (≥1bp) and expanding 1kb up and down the genome from the middle position of the ATAC-seq peak. We considered the entire 2kb region, centered on the ATAC-seq peak, as an enhancer. We removed promoter regions, defined as 1.5kb upstream and 0.5kb downstream (2kb in total) of the transcription start site of known genes from the UCSC genome browser (ftp://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/knownGene.txt.gz).
To define enhancers in HepG2 and K562, we used the ENCODE data and performed the same procedure, using ATAC-seq peaks for HepG2 (n=279,739; ENCSR042AWH) and K562 (n=269,800; ENCSR868FGK) from ENCODE. We identified 21,162 HepG2 and 25,357 K562 enhancers in total.

Two phase deep learning model to predict enhancers
We developed a two phase deep learning (DL) classifier, TREDNet, based on convolutional neural networks (CNNs; implemented in keras v2.1.2 and tensorflow-GPU v1.4.1) to predict enhancers from DNA sequence. In the first phase, we used a model with six convolutional layers (~143 million trainable parameters; Table S4) to predict 1,924 different genomic and epigenomic features simultaneously for a 2kb genomic region. These features included DHSs, TF ChIP-seq peaks, and histone mark ChIP-seq peaks from the ENCODE (1) and NIH Roadmap (2) studies. We tiled the entire human genome using a sliding window of length 2kb and the step length of 200bp. We selected those segments that overlapped at least one of the 1,924 epigenomic features and trained the DL model using these segments. For training, we fit the model on all autosomes, except for chromosomes 8 and 9. During training, we used signals on chromosome 7 as a validation set. We used signals on chromosomes 8 and 9 for testing the final model, evaluating both the area under the receiver operating characteristic curve (auROC) and area under the precision recall curve (auPRC; Fig. S1). Due to the cell/tissue type-specific nature of the features, the class distribution (positive/negative) in the training set was imbalanced across the features, with only 2% of the dataset being positive cases on average. To evaluate the optimal sequence window size, we randomly selected 10% of the training data and re-fit the model using different window sizes ranging from 800bp-4kb and the same training/testing strategy (i.e., chromosome 7 as a validation set during training and chromosomes 8 and 9 for testing the final model). We evaluated model performance using auROC and auPRC as metrics, testing for improved model performance (defined as P-value<0.05) across progressively larger window sizes using the Wilcoxon rank sum test (Fig. S2). We found that the models improved at each larger window size up to 2kb, after which there was no statistical improvement. Using the 2kb model, we compared the TREDNet phase one model to previously published, similar epigenomic feature prediction models-ExPecto (5), DeepSEA (6), and Basset (7)-on the data used for TREDNet with the same testing strategy (i.e., testing performance on chromosomes 8 and 9; Fig. S1).
For the second phase (enhancer prediction), we fit three smaller models to predict pancreatic islet, HepG2, and K562 enhancers (one for each cell/tissue type) from the output of the first model (a vector of the 1,924 epigenomic predictions for 2kb sequence segments). The second model consisted of two convolutional layers resulting in~12 million trainable parameters (Table S5). For each biospecimen, we treated the enhancer regions as the positive set (encoded as 1); for the negative set (encoded as 0), we randomly sampled 10x of the number of enhancers from accessible chromatin regions (i.e., DHSs) across all biospecimens in NIH Roadmap, excluding the enhancer regions of the target biospecimen. For training, we adopted a similar strategy as phase one, using chromosome 7 to validate during training and withholding chromosomes 8 and 9 to evaluate the final model. We tested the model's performance (auROC and auPRC) using enhancers from chromosomes 8 and 9 withheld from training ( Fig. 2A). To benchmark the TREDNet phase two model, we tested previously described enhancer prediction models-BiRen (8), Tan et al. (9), and SVM (10)-on the data used for TREDNet with the same testing strategy (i.e., testing performance on chromosomes 8 and 9). Since SVM did not distribute pre-trained models, we trained SVM using the same data (i.e., enhancer definitions) and training strategy as for TREDNet. For BiRen and Tan et al., we used the pre-trained models which were not cell type specific. For Tan et al., the method produces a binary output for each input region based on the scores generated by five different models. To generate auROC and auPRC values, we used the scores generated by each of the five Tan et al. models and picked the best performing one. each DNA segment and compared these predictions to the MPRA signals, using the normalized expression scores provided by the authors. We repeated this process for BiRen, Tan et al., and SVM.
Finally, Kircher et al. (12) performed saturated mutagenesis experiments targeting several regulatory elements associated with disease. We used data from this study from enhancer regions that tested for effects of mutations in regions near ZFAND3 and TCF7L2 in MIN6 and SORT1 in HepG2. Using the log 2 allelic MPRA expression effect estimates distributed by the authors, we filtered for alleles with >=10 unique barcode tags (as done by the authors) and for alleles that showed a statistical difference in the MPRA expression read out (FDR<5%, Benjamini-Hochberg procedure (14)). We applied the biologically relevant TREDNet model (i.e., the islet model for MIN6, the HepG2 model for HepG2) calculated the log 2 fold change of enhancer probability scores for the alternate allele compared to the reference allele, comparing these predictions to the MPRA signals. We repeated this process for Tan et al. and SVM. For BiRen, because the method takes as input coordinates of genomic regions, not DNA sequences, we used the enhancer probabilities of the overall regions.

Calculation of enhancer damage scores from in silico saturated mutagenesis
For each biospecimen, we performed in silico saturated mutagenesis of enhancer regions to evaluate the effects of mutations on the overall enhancer probability score. For each 2kb enhancer region (see "Enhancer definitions"), we calculated an enhancer damage (ED) score iteratively for each nucleotide position against the GRCh37 reference sequence (i.e., we mutate each nucleotide to all possible mutations but keep the remaining 1,999 nucleotide sequence the same as the reference): where the e term represents the probability that the 2kb sequence is an enhancer, reference indicates the GRCh37 reference nucleotide, and alternate indicates a non-reference nucleotide. This ED score, generated for each bp in the 2kb enhancer region, predicts the effects of mutations at a specific base on the overall enhancer probability of the region, such that a positive score indicates a negative change in the enhancer probability (enhancer damaging) and a negative score indicates a positive change in the enhancer probability (enhancer strengthening).

Identification and analysis of transcription factor binding sites (TFBSs)
For HepG2 and K562, we used TF ChIP-seq data spanning 77 TFs in HepG2 and 150 TFs in K562 (1). For each TF, in order to extract the TFBS position, we used HOMER v4.11 (15) to scan motifs found in the database packaged with HOMER (http://homer.ucsd.edu/homer/motif/motifDatabase.html), producing a list of predicted TFBSs for each motif in the database. We ranked the motifs in decreasing order of their enrichment P-value and kept the TF of the most enriched motif.
Because islets do not have as comprehensive TF ChIP-seq profiles as HepG2 and K562, we used predicted TFBSs derived from ATAC-seq footprints described previously (16). Briefly, for two islet ATAC-seq samples, Varshney et al. scanned for potential transcription factor binding sites 6 (TFBSs) in a haplotype-aware manner using the "find individual motif occurrences" (FIMO) tool (17) with position weight matrices (PWMs) from a previously described database (18) consisting of PWMs from ENCODE (19), JASPAR (20), and Jolma et al. (21). Next, Varshney et al. used CENTIPEDE (22) to call footprints in the islets ATAC-seq data, considering a given motif occurrence bound if both the CENTIPEDE posterior probability was ≥0.99 and the motif's coordinates were fully contained within an ATAC-seq peak.
For islets, because many of the predicted TFs share similar motif patterns and result in overlapping predicted TFBSs, we reduced redundancy by aggregating islet TF footprints with similar binding motifs. For each TF, we calculated the average ED score across all nucleotides within the predicted binding sites. For the TFs with a positive average ED score across all predicted TFBSs (690 TFs), we selected 156 TFs where the average lower boundary of 95% confidence interval (CI) the binding region was greater than the upper boundary of the CI of the flanking region (defined as 10bp on each side of the TFBS). For the TFs with a negative average ED score across all predicted TFBSs (49 TFs), we selected 10 TFs where the ED scores within the CI were all negative. Using these 166 TFs, we iteratively merged TFs if >40% of their predicted binding sites overlapped and the overlapping regions were greater than half of either binding site. In total, this procedure resulted in 100 non-redundant TFs groups for islets, which we used to analyze ED scores in islet enhancers.
We used TFBSs in islets, K562, and HepG2 to evaluate if ED scores mark TFBSs by comparing the absolute value of ED scores within TFBSs to 20bp regions immediately flanking each TFBS as well as randomly sampled enhancer regions. To generate random ED scores, we shuffled the TFBSs within enhancer regions for each biospecimen 10 times and recorded the absolute value of ED scores within these regions.
In addition, to compare ED scores to evolutionary conservation profiles at TFBSs, we calculated the information content (IC) at each position of PWMs for each TF. For evolutionary conservation profiles, we used phyloP scores generated from 46 vertebrate species (23,24). Across all TF PWMs for each biospecimen, we calculated the correlation (Spearman's rho) between the IC of each PWM position and delta/phyloP scores.

Detection of enhancer damaging regions and enhancer strengthening regions
For each biospecimen, we trained two DL models, one to predict enhancer damaging regions (EDRs) and one to predict enhancer strengthening regions (ESRs) within enhancers from ED score enhancer profiles (six models total across all three cell lines).
To train the EDR/ESR classifiers for each biospecimen, we annotated nucleotides within enhancers with 1 (positive set) if the nucleotide overlapped a TFBS and otherwise 0 (control set). We subsequently excluded from the control set (i.e., enhancer nucleotides annotated as 0) genomic regions (i) between any two TFBSs in an enhancer (even if these TFBSs are on the opposite ends of the enhancer), (ii) within 10bp of a TFBS, (iii) within 20bp of an enhancer boundary, and (iv) in an enhancer of less than 50bp. For HepG2 and K562, we used all TFBSs, described in "Identification and analysis of transcription factor binding sites". For islets, we used the TFBSs from the 166 TFBSs derived from ATAC-seq footprints, described in "Identification and analysis of transcription factor binding sites". For each enhancer nucleotide, we derived a series of features from ED score predictions across various window sizes and used these features to predict the location of TFBSs within the enhancer sequence (i.e., the positive or control status of each nucleotide within the enhancer). For each nucleotide, we scanned a series of windows from 10bp in length to 1bp in length. For each window length >7bp, we defined a core region as the 6bp in the center of the window and calculated the following metrics: (i) the average ED score of nucleotides within the window, (ii) the maximum ED score of nucleotides within the window, (iii) the fraction of nucleotides within the window with a positive ED score, and (iv) the fraction of nucleotides within the core region with a positive ED score. For windows of length <6bp, we repeated the same procedure, but using a core region equal to the window size. For each window length, we iteratively scanned around the target nucleotide such that the target nucleotide occupied every position within the window (i.e., we incremented the window position by one base pair for each iteration resulting in 10 sliding windows for a window size of 10bp). For a single nucleotide, the result of this procedure was a series of four metrics defined across 55 windows (of size 10bp to 1bp). We concatenated these metrics together across all windows to generate a vector of 220 values for each nucleotide, representing a comprehensive description of a local mutational impact of each nucleotide on an enhancer. After iteratively performing this procedure across all nucleotides within enhancers, we then fit a two layer CNN (implemented in keras v2.0.8; architecture described in Table S6) to predict the TFBS annotation status (0 or 1) from the 220 values calculated for each nucleotide. During training, we randomly sampled 20% of the input data as a validation set and excluded chromosomes 8 and 9 entirely. We used signals on chromosomes 8 and 9 for testing the final model, evaluating both the auROC and auPRC (Fig. S7).
To train the EDR classifier, we selected for TFBSs where the average ED score was positive. To train the ESR classifier, we performed the same procedure, except we selected for TFBSs where the average ED score was negative.
Both the EDR and ESR models were highly accurate (Fig. S7) and resulted in a score for each nucleotide within an enhancer representing the likelihood that the nucleotide overlaps a TFBS. We used these scores to label regions as EDRs and ESRs. For the EDR models, we used the peak model for each biospecimen and labeled any span of nucleotides (one after the other) of length >=3 with a likelihood score >=0.312 as an EDR. We repeated the same procedure with the ESR models, calling regions with >=3 consecutive predicted negative EDR nucleotides and a likelihood score >=0.152 as ESRs.
We compared the TREDNet TFBS prediction model from enhancer damage scores to a model that predicted TFBSs directly from DNA sequence. For the DNA sequence TFBS prediction, we fit a single multi-task model to predict the TFBS annotation status (0 or 1) used in the EDR/ESR models from one-hot encoded DNA sequence of the 2kb region centered on the TFBSs (model architecture the same as TREDNet phase one). We fit a separate model for each biospecimen (i.e., islet, HepG2, K562), randomly sampling 20% of the TFBSs, and compared the resulting model to the TREDNet TFBS prediction model using the auROC and auPRC as metrics with the same training validation and test scheme as used in the ED score model (Fig. S8).
To validate the EDR and ESR predictions as regulatory sites within enhancer sequences, we compared the density of SNPs reported to have an allelic effect on transcription in K562 and HepG2 8 MPRA experiments (25) in DHSs, enhancer regions, and EDR/ESR regions ranked by their average ED score. For comparisons, we calculated the number of MPRA validated SNPs in each genomic region divided by the total length of the region.

Calculation and validation of islet enhancer perturbation scores
We generated a catalog of predicted effects of 67,226,155 SNPs on islet enhancers using the genome Aggregation Database (gnomAD) v3.0 (26), including SNPs with a minor allele frequency (MAF) >=0.0001. Since gnomAD v3.0 uses GRCh38 coordinates, we lifted these coordinates over to GRCh37 to match the data used to train the model. Across all SNPs, we calculated islet enhancer perturbation (IEP) scores using a two phase procedure. First for each SNP, we calculated the probability, e, of either allele falling in an enhancer, given 2kb of the flanking reference sequence (GRCh37) surrounding the SNP. Next, we generated IEP scores: where the e term represents the probability of an allele residing in an islet enhancer. For enrichment calculations and subsequent binning, we calculated the percentile rank of IEP scores. To validate IEP scores at the genome-wide level, we used previously published islet genetic studies spanning gene/exon expression (3), chromatin accessibility (27), and MPRA data generated from the MIN6 mouse pancreatic islet beta cell line (28). From these data, we generated genome annotations that were subsequently used for enrichment calculations. For the genetic association data (eQTLs, exonQTLs, and caQTLs), we selected all QTLs and marked the genomic location of all SNPs in LD (r 2 >0.8) with the lead SNP (minimum P-value at the locus). For the MIN6 MPRA data, we marked the genomic location of SNPs reported to induce activity in either the unstimulated (baseline) or stimulated (endoplasmic reticulum stress) conditions. We did not LD expand the MPRA SNPs because the MPRA data has SNP resolution since the alleles of single SNP were tested in a reporter construct, thereby breaking the LD structure that exists naturally in the human genome. Using these annotations, we calculated the enrichment of SNPs across progressive IEP percentile cutoffs using GARFIELD v2.0 (29), a logistic regression method that controls for the distance of each SNP to the nearest gene and the number of SNPs in LD. For these enrichments, we used the~24 million SNPs from the UK10K project and their LD estimates included in the GARFIELD package. Rather than using binarized P-value thresholds from a genetic association study to define the dependent variable for the logistic regression, we used IEP percentile cutoffs. Across all enrichment tests, we used the Bonferroni procedure to control for the number of tests performed.
For islet specificity comparisons, we calculated the enrichment of previously published MPRA data from the K562 and HepG2 cell lines (25). We compared the enrichment coefficients of the MIN6 MPRA data to K562 and HepG2 using a z-test as described in Paternoster et al. (30).
Finally, in addition to the genome-wide enrichments, we also compared the enrichment of credible set SNPs to SNPs in LD across IEP scores. We used credible set SNPs for T2D, fasting blood glucose levels, blood glucose levels, and glycated hemoglobin (HbA1c) levels as described in the "Refinement of credible sets for T2D and glycemic traits" section. For each credible set SNP, we selected the SNPs in LD (r 2 >0.8) based on the 1000 genomes phase 3 (v5) European reference panel (31). We calculated the enrichment of both credible set SNPs and LD SNPs across increasing IEP percentile cutoffs by performing a hypergeometric test (phyper function in R), controlling for the number of tests using the Bonferroni procedure. We also calculated the enrichment of credible set SNPs and LD SNPs across SNPs ranked by IEP scores using fGSEA v1.25.1 (32).

Refinement of credible sets for T2D and glycemic traits
To empirically define a threshold for prioritizing candidate causal SNPs using IEP scores, we used 99% credible sets from two T2D fine-mapping genetic studies (uniform priors): a European ancestry study (33) and a trans-ancestry study (34). To focus on likely distal regulatory signals, we excluded signals where >=1 SNP fell in a coding region from both datasets. For each T2D-associated signal in each study, we calculated the ratio of the IEP score of the largest IEP score to the second largest IEP score (IEP ratio 1:2 ). Next, we treated the trans-ancestry study as a "truth set", since trans-ancestry studies have greater power to fine-map due to different LD patterns across ancestries, and asked how often we could nominate the trans-ancestry candidate causal SNP using progressive IEP ratio 1:2 cutoffs in the European ancestry study. To make the trans-ancestry "truth set", we selected the 11 signals with one SNP in the 99% credible set that were not fine-mapped to a single candidate causal SNP in the European ancestry study. We selected the 179 from the European ancestry study with >1 SNP in the 99% credible set and performed a hypergeometric test (phyper function in R) across progressive IEP ratio 1:2 thresholds. We selected the minimum IEP ratio 1:2 cutoff with P<0.05, corresponding to an IEP ratio 1:2 of 24, and applied the cutoff to the trans-ancestry credible sets to identify additional signals with a signal candidate causal SNP.

Allelic imbalance analysis
For allelic imbalance analysis, we collected 24 islet ATAC-seq samples from non-diabetic donors for which SNP genotypes were also available: 1 sample from Varshney et al. (16), 10 samples from Rai et al. (37), and 13 samples from Khetan et al. (27). We note that in instances where the same sample was sequenced more than once across studies, we chose one sample randomly. We tested for allelic imbalance at all SNPs in the 95/99% credible sets for T2D and glycemic traits where our model predicts a single candidate causal SNP that previously had >1 candidate causal SNP at each association signal (Table S2). We followed the computational procedure outlined in Greenwald et al. (38). Briefly, we mapped reads using bwa v0.7.17-r1194-dirty (39) and filtered reads with low mapping quality ("-q 30 -M"). We then used WASP v0.3.4 (40) to remove duplicate reads and correct for reference mapping bias. Using a binomial test to assess imbalance on a per sample basis, we tested for allelic imbalance at SNPs that had at least two heterozygotes with two or more reads covering each allele. Finally, we calculated z-scores and used Stouffer's method (41) to calculate a combined z-score and P-value across samples, weighting the z-scores by the sequencing depth of each sample (42). We controlled for the number of tests using the Benjamini-Hochberg procedure (14).

TFBSs enrichment of candidate causal SNPs
To calculate the enrichment of the candidate causal SNPs in TFBSs, we calculated the fraction of SNPs that overlap TFBSs predicted from ATAC-seq footprints (see "Identification and analysis of transcription factor binding sites") using both the candidate causal set and a 10-fold control set, derived from islet DHS regions. We computed the fold-enrichment, calculated P-values using Fisher's exact test, and controlled for the number of tests using the Benjamini-Hochberg procedure (14).

Calculation of the effects of candidate causal SNPs on TFBS motifs
To calculate the allelic effects of candidate causal SNPs on binding motifs, we used TFBS motifs from ENCODE (19), JASPAR (20), and HOMER (http://homer.ucsd.edu/homer/motif/motifDatabase.html). We ran FIMO (packaged with MEME v4.9.0) with a P-value threshold of 0.01, keeping motifs that had hits for the sequences of both alleles. The remaining motifs were sorted in the decreasing order of the value |log(P-value ref ) -log(P-value alt |).

Electrophoresis mobility shift assay (EMSA) experiments
For EMSA experiments, we designed 21bp biotin end-labeled complementary oligonucleotides with each SNP allele tested centered at the 11th position of the oligo (Integrated DNA Technologies; Table S7). Each forward and reverse oligo for the biotinylated probes were biotinylated at their 5' ends. We annealed complementary oligos to create double-stranded probes for each tested sequence. Using the NE-PER Extraction Kit (Thermo Scientific), we prepared nuclear extract from human EndoC-βH3 cells in the non-proliferating, excised state (43) and completed EMSAs using the LightShift Chemiluminescent EMSA kit (Thermo Scientific) according to the manufacturer's instructions. Each binding reaction contained 1X binding buffer, 1 µg poly dI-dC, 4 µg EndoC-βH3 nuclear extract, and 200 fmol of biotinylated double-stranded probe. For competition reactions, 20-or 40-fold excess of unlabeled probe for each allele was added to the reaction mix and pre-incubated at 25°C for 15 minutes prior to addition of the labeled probe. We incubated reactions at 25°C for 25 minutes, after which we resolved DNA-protein complexes on a 6% DNA retardation gel (Invitrogen) and detected them by chemiluminescence after transfer and UV crosslinking to a nitrocellulose membrane.

Luciferase experiments
We synthesized 701bp DNA fragments around the identified SNPs (GenScript Biotech Corporation) and cloned the fragments into a pGL3-Promoter vector with XhoI and BglII restriction sites. We co-transfected the reporter construct and a pRL-TK Renilla luciferase reporter (Promega) into EndoC-bH1 cells using lipofectamine 2000 (Thermo Fisher). 48 hours after transfection, we lysed cells and loaded the cell lysates into a 96-well plate to detect luciferase activity using a Synergy H1 Microplate reader. We calculated relative luciferase activity, normalizing firefly luciferase activity to that of Renilla luciferase. We performed 12 replicates for each DNA fragment and compared the relative luciferase activity between each allele using the Wilcoxon rank sum test.

Data availability
The models from this study (architecture and weights), enhancer locations, EDR/ESR locations, and IEP scores for gnomAD SNPs are available through zenodo (https://zenodo.org/record/8161621).
Singapore and National University Health System, Singapore, Singapore. 44 Department of Anthropology, University of Fig. S1. Characterization of TREDNet phase one. Phase one TREDNet peak prediction accuracy for transcription factors (TFs), histone modifications (HMs), and DNase I hypersensitivity sites (DHSs; x-axis) compared to other models (colors) using area under the receiver operating characteristic (auROC; left) and area under the precision recall curve (auPRC; right) metrics (y-axis).

… C A A T G T T T A C T C … … A A T G A C T C A T G C …
G T

… C A A T G T T T A C T C … … A A T G A C T C A T G C … … C A A T G T T T A C T C … … A A T G A C T C A T G C …
2 kb enhancer (A) For each 2kb enhancer region, we calculate the TREDNet enhancer probability score (e) using the "wildtype" GRCh37 reference sequence (e wt ). To predict the mutational effect of each nucleotide, we calculate enhancer probability scores for all three non-reference nucleotides at each enhancer position while the remaining enhancer DNA sequence remains unchanged. We generate enhancer damage (ED) score mutational profiles at each sequence position (ED score i ) by computing the average difference between the wildtype and allele-specific enhancer scores. (B) We find that known TFBSs correspond to enhancer damaging regions (EDRs; red bars) and enhancer strengthening regions (ESRs; blue bars). To identify known and unknown TFBSs (EDRs/ESRs in gray bars) directly from ED scores, we train a second model to annotate each nucleotide position as an enhancer damaging, enhancer strengthening, or neutral. TF-1 depicts a novel predicted EDR. TF-2 depicts a novel predicted ESR. The ED scores in this plot are for illustrative purposes only.