Data showing atherosclerosis-associated differentially methylated regions are often at enhancers

Atherosclerosis involves phenotypic modulation and transdifferentiation of vascular smooth muscle cells (SMCs). Data are given in tabular or figure format that illustrate genome-wide DNA methylation alterations in atherosclerotic vs. control aorta (athero DMRs). Data based upon publicly available chromatin state profiles are also shown for normal aorta, monocyte, and skeletal muscle tissue-specific DMRs and for aorta-specific chromatin features (enhancer chromatin, promoter chromatin, repressed chromatin, actively transcribed chromatin). Athero hypomethylated and hypermethylated DMRs as well as epigenetic and transcription profiles are described for the following genes: ACTA2, MYH10, MYH11 (SMC-associated genes); SMAD3 (a signaling gene for SMCs and other cell types); CD79B and SH3BP2 (leukocyte-associated genes); and TBX20 and genes in the HOXA, HOXB, HOXC, and HOXD clusters (T-box and homeobox developmental genes). The data reveal strong correlations between athero hypermethylated DMRs and regions of enhancer chromatin in aorta, which are discussed in the linked research article “Atherosclerosis-associated differentially methylated regions can reflect the disease phenotype and are often at enhancers” (M. Lacey et al., 2019).

clusters (T-box and homeobox developmental genes). The data reveal strong correlations between athero hypermethylated DMRs and regions of enhancer chromatin in aorta, which are discussed in the linked research article "Atherosclerosis-associated differentially methylated regions can reflect the disease phenotype and are often at enhancers" (M. Lacey et al., 2019).
© 2019 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons. org/licenses/by/4.0/).

Data
Tables 1e5 (included as Supplementary files) show statistically significant atherosclerosisassociated DMRs (atheroDMRs), their over-representation among enhancers and superenhancers, the functional associations of their linked genes based on gene ontology (GO) analysis, a literature survey of the involvement of DMR-linked genes to atherosclerosis, and RNA-seq data for the tissue-specific expression of these genes. In addition, data are presented for specific examples of genes in which athero DMRs only partially overlap leukocyte-associated DMRs (Figs. 1 and 2) or overlap aorta-related enhancer and super-enhancer chromatin in genes important for different aspects of proper smooth muscle cell function (Figs. 3e6). Lastly, the very strong Specifications  Value of the data Genes with atherosclerosis-associated differentially methylated regions (DMRs) were defined from publicly available bisulfite-sequencing data by a stringent statistical analysis that revealed predominant hypermethylation but also atherosclerosis-relevant hypomethylation in atherosclerotic aorta and will be a source of disease-linked DMRs for further studies. Data include figures and tables detailing the locations and percent methylation differences of hypermethylated and hypomethylated DMRs and whether these overlap enhancer or promoter chromatin in aorta, data that are valuable for understanding regulation of expression of genes involved in atherosclerosis.
The figures and tables are also a novel resource for understanding the tissue-specific epigenetics of many individual genes in normal aorta (e.g., smooth muscle cell-associated genes and HOX genes). Data illustrate the often overlooked importance of enhancer regions to disease as well as to differentiation, which will aid future research on the regulation of gene expression. The density of CpGs is shown; there were no CpG islands in this region according to the UCSC Genome Browser [5]. (C) Bisulfite-seq tracks are plotted as average % methylation at CpGs; blue horizontal bars, regions that display statistically significant hypomethylation relative to the rest of the same genome. Light blue highlighting, the region of atherosclerosis-associated hypomethylation overlapping monocyte-(and leukocyte-) hypomethylation; gray highlighting, the region that did not exhibit an athero hypometh DMR but did show monocyte-(and leukocyte-) hypomethylation. Ctl, control; SkM, skeletal muscle; brain, prefrontal cortex; heart, left ventricle; PBMC, peripheral mononuclear blood cells; neut, neutrophils; expr, expression; str, strong; wk, weak; biv, bivalent; prom, promoter; enh, enhancer; repr, repressed; Txn-chrom, chromatin with a type of histone modification seen in many actively transcribed chromatin regions (histone H3 lysine-36 trimethylation). Tissue-specific and athero DMRs as in Fig. 1. (C) Bisulfite-seq profiles of DNA methylation with blue horizontal bars indicating statistically significant hypomethylation relative to the rest of the same genome except for Ctl aorta C, which did not have this kind of analysis in the public database. Gray highlighting, the region that did not exhibit an athero hypometh DMR but did show monocyte-(and leukocyte-) hypomethylation; mod, moderate.          (Table 5).
relationship of athero DMRs to developmental transcription factor-encoding HOX and TBX genes is illustrated (Figs. 7e11).

Bioinformatics
For the atherosclerotic and control aorta samples from the same individual (88 yo female, athero aorta A, aortic arch, and control aorta A, thoracic aorta, respectively), the whole genome bisulfite sequencing (bisulfite-seq) data from Zaina et al. [2] were used. In addition, we used bisulfite-seq profiles from two additional control aorta samples: control aortas B, 34 yo male, and control aorta C, a 30 yo female; Roadmap Epigenetics Project [3,4]. The subsection of aorta for control aorta C is not known but control aorta B, which was used for bisulfite-seq profiles and the analyzed Roadmap histone modification and chromatin segmentation profiles, was intra-abdominal aorta from below the renal arteries, before the iliac bifurcation. The Roadmap databases including the bisulfite-seq methylomes and chromatin state segmentation (chromHMM, AuxilliaryHMM) profiles are available at hubs for the UCSC Genome Browser hg19 [5] and are as previously described [6]. Chromatin state segmentation is based upon histone modifications (histone H3 lysine-4 mono-or-trimethylation; H3 lysine-27 acetylation or trimethylation; H3 lysine-36 trimethylation and H3 lysine-9 trimethylation). For the DNA methylation analysis, we found similar coverage in the DNA-seq analysis of the bisulfite-treated matched atherosclerotic and control samples. Note that the Roadmap sample labeled "macrophage" in the bisulfite-seq track at the UCSC Genome Browser [5] is actually primary CD14 þ monocytes from blood, like the corresponding chromatin segmentation track [3]. The skeletal muscle (SkM) sample for bisulfite-seq and chromatin state was psoas muscle. The color code for the 18-state chromatin state segmentation was slightly simplified from the original [3]. Quantification of RNA-seq for tissues was from the GTEx database using transcripts per million read (TPM for Table 5) values from more than 100 samples for each tissue type [7]; the aorta tissue used for these RNA-seq analyses was from the thoracic region. Functional associations of DMRs used the Genomic Regions Enrichment of Annotations Tool, GREAT [8] and associations of the genes themselves used Database for Annotation, Visualization and Integrated Discovery, DAVID [9]. Aorta super-enhancers were determined from the dbSuper database [10].

Determination of athero and tissue-specific DMRs
Bisulfite-seq data from the athero aorta A and control aorta A samples were initially merged and analyzed on a site-by-site basis by applying Fisher's exact test to the counts of methylated and unmethylated CpG reads in each sample to produce site-specific p-values p i . Based on these results, candidate DMRs were then identified by determining the joint probability of a sequence of five or more consecutive p-values (p i, p iþ1, …, p iþk ) according to the Uniform Product (UP) distribution as described in our previous study [11], where each candidate DMR was required to begin and end with a statistically significant site. This analysis identified statistically significant regions at the 0.05 level, and these were subsequently filtered to include only those regions with an average percent methylation difference (PMD) of at least ±20%, length greater than 250bp, and no gaps >200 bp between consecutive sites. Next, these samples were merged with control aorta samples B and C, using custom scripts to correct for single-bp shifts due to variable pre-processing routines. For all sites present in all four samples, logistic regression models were fit to the counts of methylated and unmethylated reads at each CpG site to determine the statistical significance associated with the difference in percent methylation between the athero aorta A and the three aorta controls. Associated p-values for the comparison of athero aorta vs. three controls were then analyzed using our UP method to identify candidate DMRs, which were subsequently filtered as above for length, PMD, and gaps. Our final set of athero-associated DMRs were determined from those DMRs identified both in athero A vs. control aorta A as well as in the more general comparison of athero A vs. control aortas A, B, and C. PMD values are reported based on the differences observed for athero A vs. control aorta A.
To determine tissue-specific effects among the selected Roadmap samples (aorta, left ventricle, SkM, lung, adipose, and monocyte), "one-to-many" comparisons were run in which logistic regression models were fit to determine the PM differences associated with a selected sample relative to the others as a group. Because not all sites were present for all six samples, we required that any tested site be contained in the target sample and at least four of the five non-target samples. DMR identification and filtering was done using the same approach as for the athero DMR analyses. All preprocessing and analysis was performed in R version 3.4.4 [12]using custom scripts. For identifying overlaps of athero DMRs with a normal tissue DMRs, we used an R script for each of four normal tissues (aorta, monocytes, SkM and heart) and then for each athero DMR to determine any overlap of the athero DMRs of 50 bp with a given tissue DMR.

Mapping DMRs to genes and enhancer chromatin
Scripts in R language [12] with adaptations [13,14] were used to determine which genes and enhancer chromatin (enh chromatin) segments were associated with athero DMRs. To determine the gene isoform associated with a given DMR, the reFlat table was used [5] for the RefSeq hg19 genome (accessed January 7, 2018). The protein-coding gene or, secondarily, the non-protein-coding gene was selected with the largest overlap of the gene regions in the following order of precedence with distances given relative to the transcription start site (TSS) or the mRNA-determined transcription end sequence (TES): TSS e 5 kb to TSS þ 5 kb; TSS þ 5 kb to the TES; intergenic (other sequences). For determining chromatin enhancer segments overlaps, DMRs were used that had at least a 50-bp overlap total with enh chromatin from any of the following enhancer states in the 18-state chromatin segmentation model [3,5]: state numbers 3, 8, 9 or 10.