The atlas of human cardiac promoters and enhancers reveals an important role for regulatory elements in heart failure

A continuous increase in the prevalence of heart failure and the lack of adequate therapy highlight poor understanding of the underlying genetic regulatory mechanisms involved in heart failure pathogenesis. Growing evidence has demonstrated a signi�cant contribution of non-coding genome regulatory elements towards transcriptomic changes in heart disease. Thus, there is a pressing need for a comprehensive resource of the human cardiac regulatory network in healthy and failing states. We applied cap analysis of gene expression sequencing to directly measure the expression of RNA associated with enhancers and promoters. Based on this data, we constructed the atlas of transcribed cardiac regulatory elements from 21 healthy and 10 failing (ischemic and non-ischemic cardiomyopathy) human hearts. In total, we have sequenced 109 samples from the left and right atria and ventricles, identifying 17,668 promoters and 14,920 enhancers associated with 14,519 genes. Leveraging this atlas, we provide insights into functional and structural regulatory changes between healthy and failing hearts. Healthy atria and ventricles had distinct pathway enrichment and transcription factor binding patterns, signi�cantly remodeled by heart failure. Using the advantages of deep sequencing that allow effective analysis of cis-regulatory elements-derived RNA, we found that heart failure is associated with the expression of transcripts derived from alternative promoters and a speci�c set of transcribed enhancers. Furthermore, we identi�ed a high prevalence of single nucleotide polymorphisms associated with cardiovascular diseases within the regulatory regions highlighting their importance in disease pathogenesis. This open-source atlas will serve the cardiovascular community to improve understanding cardiac regulatory network and facilitate the development of novel therapeutics.


Introduction
Cardiovascular disease is the leading cause of morbidity and mortality in the United States and worldwide.
Gene expression, which governs protein production and other cell processes, is aberrant in heart failure 1 .The network of regulatory elements, such as promoters and enhancers, modulates the gene expression by interacting with transcription factors in a 3D chromatin space 2,3 .Both promoters and enhancers are cisregulatory elements meaning they are located on the same chromosome and near their target gene 3 .
Promoters usually contain a transcription start site (TSS), transcription factor binding site (TFBS), and TATAbox which facilitate gene transcription 3 .Enhancers are short regions of DNA that bind transcription factors and co-activators which then interact with gene promoters to enhance the rate of gene expression 4 .
The expression of every gene is regulated by several regulatory elements in an upstream region.Due to the complexity of the regulatory mechanism, precise genomic locations of promoters and enhancers are not yet well characterized.Mutations in promoter and enhancer regions can change the transcription factor binding a nity, resulting in altered gene expression 2 .Co-localization of regulatory elements with cardiovascular disease-associated variants can explain those transcriptional changes 2 .A comprehensively annotated library of cardiac regulatory elements can give meaningful insights into clinically relevant genes and potentially explain the causes of gene expression remodeling during disease pathogenesis.
Cap analysis of gene expression (CAGE) is a high throughput RNA expression analysis approach for TSS identi cation at one base pair (bp) resolution.This approach is used in FANTOM5 and ENCODE projects for accurate 5' ends annotation in human, mouse, and other plant and animal genomes 5 .Deep sequencing of non-ampli ed CAGE libraries allows for precise estimation of expression of coding and non-coding transcripts.Moreover, since the number of mapped tags directly depends on the mRNA amount in the sample, this powerful approach is also useful for differential expression analysis of the mapped clusters.The CAGE technique allows identifying alternative promoters that might affect the nal protein structure or even estimate short TSS shifts linked to regulatory network features 6,7 .Enhancers are identi ed by bidirectional CAGE signal pro les, which are usually functional at a much higher frequency than nontranscribed enhancers if compared with DNase I hypersensitive sites or histone modi cation signals 8 .The bidirectional pro le of active enhancers occurs due to RNA transcription from the center of enhancer to opposite directions 4 .Transcribed enhancer RNAs interact with several regulatory complexes and are transient in nature.Such behavior requires higher sequencing depth to detect low expression of enhancer RNA.Additionally, the transcribed enhancers accumulate causative mutations and could be associated with promoters within a complex regulatory network 8 .
We sequenced myocardial tissue samples from healthy donor hearts and hearts from ischemic and nonischemic cardiomyopathy (ICM and NICM) patients obtained at the time of heart transplantation to utilize CAGE sequencing for regulatory network detection and its remodeling.Despite the common manifestations of the diseases during end stage such as ventricular dilation, myocardial brosis, and reduced myocardial contractility, genetic differentiation between ICM and NICM could be essential for prognosis and tailored pharmacological therapy 9 .Microarray studies showed genes related to immunologic processes are activated in NICM while immediate-early response genes and cytokine pathways are upregulated in ICM 9 .Yet, those studies only had a subset of genes, and it remains unclear what governs those transcriptional changes.
Here, we present the comprehensive atlas of promoters and enhancers from four cardiac chambers: left and right atria and ventricles of 21 healthy and 10 failing human hearts.This annotated atlas presents 17,668 promoters and 14,920 enhancers associated with 14,519 genes expressed in the human heart.We developed a robust classi cation pipeline and identi ed 351 novel promoters and 1,318 novel enhancers not previously described.Furthermore, we identi ed cardiac chamber-speci c differential regulatory element usage between healthy and failing hearts and between ICM and NICM etiology of heart failure.Using the genome-wide association studies (GWAS) database of common cardiac diseases, we located 1,831 single nucleotide polymorphisms (SNPs) in regulatory regions.We identi ed multiple cases where nucleotide change signi cantly affects the DNA binding motif of a transcription factor.

Map of the human heart genome regulatory network
Our atlas contains 109 individual CAGE libraries prepared from 31 healthy and failing hearts (Figure 1A).We aligned 1025M reads to hg38 genome assembly total with an average mapping ratio of 97.7 % (Figure 1B).
Most of the mapped CAGE tags were in promoter regions (Figure 1C).CAGE signal clustering resulted in 55,204 decomposition peak identi ers (DPIs) and 10,254 bidirectional enhancers (promoter and exon regions were masked) (Figure 1D-E).Heart CAGE DPI clusters associated with 14,519 genes corresponding to 27,557 Gencode or 20,411 Refseq transcripts.Bidirectional enhancers were connected to DPI clusters in the 500kbp range: 1,142 bidirectional enhancers have signi cant associations with 1,491 DPI related to 469 genes (correlation thresholds r ≥ 0.5 and p-value < 0.05).
Next, we performed classi cation of CAGE peaks by training TSSClassi er on control 2kb sequences such as eukaryotic promoter database (EPD), reference transcription start site (refTSS), promoter-like sequences (PLS), proximal enhancer-like sequences (pELS), distal enhancer like sequences (dELS) from Encode, and FANTOM5 enhancers.This step resulted in 30,036, 37,965, 31,008, 12,768, 15,062, and 5,616 matched heart CAGE clusters, respectively (Supplementary Figure 1A).Heart CAGE clusters were checked to have RNA-seq signals using Encode and other available sources 10,11 .This gives an option to lter out CAGE peaks without an RNA-seq signal to identify new markers.In total, 75.8% of DPI clusters were associated with transcription, but in the case of bidirectional enhancers, only 14% had overlap with RNA-seq signal.Other Encode libraries, including DNase-seq, assay for transposase-accessible chromatin sequencing (ATAC-seq), Rampage, and chromatin immunoprecipitation sequencing (ChIP-seq), were used for overlapping and classi cation (Supplementary Figure 1B-C, Supplementary Table 2-3).This step allowed to lter out CAGE peaks without known ATAC/Rampage/DNase/Chip-seq signals to identify new targets.For example, ChIP-seq based classi cation con rmed the total of 10,154 active promoters and 4,533 active enhancers combined between DPIs and bidirectional enhancers.There were 29,332 CAGE peaks overlapping with signi cant PolR2A signal.Most of the DPI CAGE peaks had an "active promoter" label as expected, while ChIP-seq based classi er for bidirectional enhancers showed just a slightly higher number of "active enhancers" compared to "active promoters."This can be explained by the insu cient sample diversity of ChIP-seq libraries.Most of the TSS (65%) showed the canonical YR initiator motif, 2.4% showed alternative YC, and ~30% had other dinucleotide frequency with the uncertain functional role 6 (Supplementary Figure 1D).This "other" motif had enriched G nucleotide which could be related to CAGE library preparation bias or other elements with the uncertain functional role.In addition, we validated our bidirectional enhancers by overlapping them with regions from other databases (Supplementary Figure 1E).
Based on GENCODE annotation, the DPI clusters were primarily found in promoter regions of genes, while bidirectional enhancers were located in intergenic and intronic regions (Supplementary Figure 1F).Usage of the EPD-like cluster classi cation assigned the location of 98% of DPIs to promoter regions.Initiator motifs for classi ed DPIs were well represented in promoter-like clusters and look similar to EPD, refTSS, and PLS with YR motif (Supplementary Figure 2).A similar motif was also present in enhancer-like clusters (pELS), but the motif is less evident in dELS-like regions.
In total, we de ned 17,668 promoter and 14,920 enhancer regions active in the human heart, while 150 regions exhibited ambiguous features.Aggregation plots and heatmaps for Encode epigenetic libraries overlapping heart CAGE consensus regions are shown in Supplementary Figure 3.

Heart CAGE atlas revealed novel regulatory clusters speci c for cardiac tissue
Compared to other studies, heart CAGE peaks are well supported by other omics data, including histone methylation, acetylation, ATAC-seq, DNase-seq, and Chip-seq for Pol2 (Figure 2).De ned CAGE peaks have relatively high GC content and conservation scores, while the total length of consensus regions is the shortest resulting in higher resolution for motif and SNP analysis.
From the overlap with the studies, where heart regulatory regions are available, we identi ed 3,204 peaks (Supplementary Figure 4).These unique peaks contained 351 novel promoters, 1,318 novel enhancers, 8 ambiguous, 500 not classi ed consensus regions, and 43 alternative peaks.Direct comparison to the FANTOM5 database, where the CAGE technique is used as the primary tool for promoter and enhancer calling, showed 12,670 new DPIs (putative TSS) and 7,381 bidirectional enhancers.Still, EPD-like cluster classi cation reduced the number of new predicted promoters to 3,238 (Supplementary Figure 5A).The largest collection of human enhancers and promoters developed up to date (FANTOM5 project) contains only two heart-derived samples in total: one left atrial and ventricular sample, and through higher number of samples and deeper sequencing in this study allowed us to identify new cardiac peaks.Furthermore, the overlap with FANTOM5 peaks drastically improves the speci city of the epigenetic signal and clearly de nes the location of transcribed cis-elements (Supplementary Figure 5B).Applying de novo motif enrichment, we con rmed the presence of the transcription initiation sequences such as TATA-box and initiator element (INR) in the DPI regions (Supplementary Figure 6A).The most representative peak in the bidirectional enhancer region was the transcription factor binding site for the myocyte-speci c enhancer factor 2B (MEF2B) motif, a muscle-speci c gene activator (Supplementary Figure 6A).Overall, there was an accumulation of TFBS close to TSS in heart promoters and in the center of the bidirectional enhancers, which is expected from promoter and enhancer regions, thus con rming the accurate classi cation of promoter and enhancer regions (Supplementary Figure 6B).
Besides looking at the CAGE peak overlap with different databases, we also tested how sequencing depth affects comprehensive cluster detection.By increasing the library size, we have observed the sequencing depth reaching a plateau with no additional active genes and transcripts being detected (Supplementary Figure 7A).Our samples were saturated at this sequencing depth level in terms of new genes and transcripts active in different parts of the heart.Interestingly, we observed that the right and left atria had ~2,000 more active genes as compared to the left and right ventricles.Compared to FANTOM5 atrial and ventricular samples, we detected additional ~2,200 genes expressed in the human heart chambers (Supplementary Figure 7A).These genes participate in important signaling and metabolic pathways such as Wnt, mTOR, and autophagy (Supplementary Figure 7B).Libraries appeared saturated at the level of 1-2 million reads.Despite the saturation, deeper sequencing may still uncover the activity of low expressed genes and pseudogenes, including snRNA related to RNA transport and mRNA splicing 12 .

Differential expression analysis reveals large differences in transcription by cis-regulatory elements between healthy and failing atria and ventricles
Dimensionality reduction showed a clear difference between atria and ventricles and between healthy and failing states (Figure 3A).However, we noticed a trend that atrial expression from failing hearts becomes similar to that of ventricular samples, with some exceptions.Information about these samples is available on the Zenbu reports platform with an interactive principal component analysis plot.Correlation analysis con rms those ndings by showing a strong correlation within atria and ventricles, with most of the failing atrial samples showing ventricular pattern of expression (Supplementary Figure 8).Differential expression analysis between healthy and failing samples resulted in 1,748 and 915 CAGE clusters signi cantly different within atria and ventricles, respectively (Figure 3B-C).Of those signi cant clusters, 645 promoters and 223 enhancers were differentially expressed in healthy versus failing atria, while 291 promoters and 105 enhancers were differentially expressed in healthy versus failing ventricles.Healthy atria showed upregulation in genes related to the upkeep of the immune system surveillance genes, such as several chemoattractants for neutrophils (CXCL1, CXCL3, and CXCL8), cytokines (CSF3), and immunoadhesion activators (SELE).Genes involved in glucose metabolism (PFKM) and lipid utilization (LPL) were differentially activated in failing atria.Healthy ventricles also had activation of the innate immune system (FCN3, LCN6), while failing ventricles had defense response activation (SPP1, ITLN1).
Functional analysis of differentially expressed clusters between healthy and failing samples also resulted in enrichment of the immune system-related GO terms, including cytokines, in ammation, chemokines in addition to muscle system processes, collagen, and contractile ber tags (Figure 3D).Among KEGG pathways most enriched were TNF signaling, complement, and cytokine-related pathways (Figure 3E).
Disease ontology enrichment analysis resulted, as expected, in mostly heart-related diseases such as cardiovascular system disease, cardiomyopathy, and other muscle and connective tissue-related diseases (Figure 3F).Connecting top differentially expressed clusters with heart diseases identi ed inactivation of cardiac structure-speci c genes (MYH6, TNNT2) and upregulation of metabolism (LPL, APOA1) and immune response genes (CXCL10, CD36) in failing hearts (Figure 3G).Functional annotation from the FANTOM5 database, such as cell type, anatomy, and disease ontology, was transferred on heart CAGE clusters (Supplementary Figure 9A).The cell type annotation revealed that TSS in human heart chambers had features of epithelial cells, myoblasts, broblasts, smooth muscle cells, electrically active cells, endothelial cells, and blood vessel cells.Such precise identi cation of cardiac cell types in clusters obtained from bulk tissue demonstrates the cell level speci city of regulatory elements.
Organ (UBERON) 13 ontology enrichment resulted in similarity to FANTOM5: heart, vessels, artery, and circulatory system-related samples, but also to embryo and lateral plate mesoderm samples.Disease ontology showed cardiovascular and heart-related diseases, but also cancer and disease of cellular proliferation.Similar patterns occurred in promoters and enhancers taken separately, con rming a functionally coherent behavior between the two types of regulatory elements.Then, to take a closer look into the functional differences between healthy and failing hearts, we annotated statistically signi cant differentially expressed clusters (Supplementary Figure 9B).The annotation showed a considerable involvement of the immune system and various embryogenesis-related pathways supporting previous observations of dedifferentiation of cells to the embryonic state during heart failure.Differential expression analysis of ICM versus NICM identi ed disease-speci c clusters, 323 for NICM and 255 for ICM (Figure 4A).Functional annotation with GO and KEGG highlighted several immune pathway activations in NICM (Figure 4B-C) and enrichment of lipid metabolism in ICM (Figure 4D-E).Then, we select several genes strongly associated with heart failure 14 to investigate their TSS usage.Variations in TSS selection re ect the diversity of preinitiation complexes and can impact post-transcriptional RNA fates.We noticed no signi cant shift if ICM is directly compared to NICM.However, we observed signi cant differential remodeling of some clusters when ICM and NICM are compared separately to healthy samples.For example, the sarcomere gene, TTN, had an upregulated cluster in ICM, while NICM had an upregulated cluster in the junctional membrane gene, JPH2 (Supplementary Figure 1).Cytoskeleton-related gene, FLNC, had a differential usage of the TSS between NICM and ICM, the former having one cluster inactivated while the latter having two clusters signi cantly inactivated compared to a healthy state.These observations demonstrate that different disease etiology could cause activation of speci c regulatory elements of the genome.

SNPs in transcription factor binding sites can alter gene expression
While the most studied SNPs are located within the coding sequences, we also identi ed a 10% fraction of heart GWAS SNPs, which reside in regulatory regions (Figure 5A), most frequently in the promoter region (Figure 5B).The SNP density in heart CAGE clusters is higher than the genome-wide density, especially in classi ed consensus regions.The highest frequency of disease-associated SNPs corresponds to familial hypertrophic cardiomyopathy, congenital heart disease, atrial septal defect, cardiomyopathy, and familial atrial brillation (Figure 5C).Among National Human Genome Research Institute -European Bioinformatics Institute (NHGRI-EBI) GWAS SNPs, speci c features included resting heart rate and glycated hemoglobin levels (Supplementary Figure 14A).Promoter regions showed accumulation of reticulocyte related SNPs while enhancers accumulated cardiac electrophysiology related SNPs (Supplementary Figure 14B-C).
Differentially expressed CAGE clusters between healthy and failing atria and ventricles accumulated GWAS SNPs related to blood pressure, autoimmune traits, blood protein levels in failing ventricles and electrocardiogram morphology, warfarin maintenance dose in failing atria (Figure 6A, Supplementary Table 6).To better understand the role of the SNPs in the regulatory regions, we identi ed the cases where these SNPs signi cantly change TFBS.In total, there were 2,679 GWAS SNPs localized in 479 unique TFBS.As an example, we looked at troponin, a known marker for the diagnosis of myocardial infarction and heart failure.
The promoter of one of three troponin subunits, TNNI3, showed a statistically signi cant increase in its activity in failing hearts and was linked with SNP in its TFBS (Figure 6B-C).Other cases of SNP in TFBS are available on the Zenbu reports platform (https://fantom.gsc.riken.jp/zenbu/reports/#heart%20CAGE%20SNP).

Discussion
Heart failure is a leading cause of death in the USA and worldwide.Fundamental genetic mechanisms of cardiovascular diseases remain one of the primary targets of cardiovascular science.The genome regulatory network and its connection to cardiovascular disease are not well studied.Here, we present a comprehensive and fully annotated CAGE atlas of human heart promoters and enhancers from four chambers of the healthy and failing human hearts, including both atria and ventricles.We report that atria have 2,000 more regulatory elements versus ventricles which result in a chamber-speci c expression pattern and explains higher exibility of atrial phenotype.For annotation, we overlayed our results with many data sets available for the adult human heart.Deeper sequencing allowed the identi cation of regulatory elements with low expression levels involved in cell maintenance and embryogenesis.We con rmed the presence of over 17,000 promoters, and due to deeper sequencing and larger sample size, we detected an order of magnitude more enhancers, ~14,000 current state of knowledge in this eld 12 .We developed a robust classi cation pipeline and identi ed over 3,000 novel regulatory regions not present in the FANTOM5 database and not described in other studies.Additionally, epigenetic markers showed that our dataset has a very concentrated signal of cardiac regulatory element regions compared to other databases.Finally, our enrichment analysis showed that our atlas represents nearly complete landscape of transcribed regulatory elements in atria and ventricles and represent ready-to-use comprehensive source for studies linking non-coding regulatory elements and diseases.
Comparing healthy to failing hearts, annotation of the top differentially expressed clusters revealed enrichment in embryonic processes enrichment suggesting a shift in gene expression due to re-expression of developmental genes in heart failure 12 , which are generally silent in the adult heart.To further understand the shift of regulatory elements in heart failure, we compared healthy to ischemic and non-ischemic cardiomyopathy samples.The most notable differences between healthy and failing samples included immune system responses and structural changes.Two etiologies of heart failure revealed unique features within the genome regulatory network.ICM speci cally exhibited metabolic remodeling, while NICM demonstrated a regulatory element shift towards immune response activation.ICM versus NICM distinction is further exempli ed by looking at the differential expression analysis at each cluster position.We highlighted the regulatory network of several genes that have substantial evidence in heart failure and genetic cardiomyopathies 14 .In addition to having multiple alternative TSS, genes related to cardiomyocyte structure such as cytoskeleton, sarcomere, and Z-disk functions showed differential TSS activation between healthy and failing hearts.These differences could help identify a novel set of biomarkers for cardiac disease classi cation and nd potential regulatory target regions for development of novel therapeutics.We showed that over 10% of SNPs associated with cardiovascular diseases are located in the promoter regions, outside the protein-coding regions.By using (NHGRI-EBI) GWAS catalog, we found 2,678 SNPs were signi cantly associated with TFBS, which means they can activate or silence the promoter and thus affect the expression of 455 genes.

Conclusion
We present an atlas of the genome regulatory elements of the human heart.This is a unique resource for understanding the functional impact of non-coding genetic regions on gene expression in healthy and diseased states.Aside from adding to the functional and genetic understanding of cardiac promoters and enhancers, CAGE data can be used to distinguish between healthy, ischemic, and non-ischemic hearts.The precise location of cardiac disease-related SNPs within the regulatory regions and their correlation with TFBS offers a novel understanding of the genetics of heart failure.

Study Limitations
Due to a limited number of samples, this study could not investigate the effects of race and ethnicity on the cardiac regulatory network.Additionally, not all hearts had all four chambers sequenced usually due to the poor quality of extracted RNA.It remains unclear why some failing atria samples clustered with ventricles, and some did not.
Even though we included sinoatrial node samples in the database, their analysis was out of the scope of this study which primarily focused on the comparison of healthy and failing atrial and ventricular samples.
Another manuscript is being prepared to report on sex differences in the cardiac regulatory network.

Human heart procurement and demographic
Healthy de-identi ed human hearts were procured from the Washington Regional Transplant Community (Falls Church, VA), which were not acceptable for transplantation.The cause of death was non-cardiogenic.
All protocols were approved by the George Washington University Institutional Review Board.Failing hearts were collected at INOVA Hospital (Falls Church, Virginia) during cardiac transplantation surgeries.

Tissue preparation, RNA extraction, and sequencing
Explanted non-diseased human hearts were arrested at the time of procurement with the cold cardioplegic solution and kept at +4°C during transportation to the research laboratory, typically less than 2 hours.Samples from the left and right atrial free wall, the right and left ventricular base, and inferior and superior sinus node were preserved in RNAlater (Invitrogen) overnight at +4°C and stored at -80°C until RNA extraction.Failing heart biopsies from all four chambers were obtained during surgery, immediately frozen in liquid nitrogen, and then delivered to the research laboratory on dry ice.To minimize RNA degradation during extraction, ash-frozen samples were transferred to RNAlater™-ICE Frozen Tissue Transition Solution (ThermoFisher Scienti c).According to the manufacturer's protocol, total RNA was extracted from 40 mg of tissue with the RNeasy Fibrous Tissue Mini Kit (Qiagen).RNA purity and yield were checked with Qbit and Agilent Bioanalyzer to satisfy RIN > 7 and total RNA amount 3-5 µg.This study was conducted in multiple sets: the rst set (7 hearts) included samples only from the left atria and left ventricle.This set employed noampli cation non-tagging CAGE (nAnT-iCAGE) library preparation and sequenced on Illumina HiSeq2500 High-Throughput mode with 50nt single end and with 6M reads per sample 15 .After detecting the difference between atria and ventricles, we expanded our sample selection with additional healthy (N = 14) and failing (N = 10) hearts, including all four chambers.The second and third sequencing sets used a more recent CAGE sequencing platform, single strand CAGE (ssCAGE).This methodology is complementary to nAnT-iCAGE with only a technical difference at the adapters ligation stage.This methodology allows for deeper sequencing needed to detect a full range of regulatory elements, especially enhancers with intrinsically lower expression.Samples were sequenced on Illumina HiSeq2500 by one-shot loading with 80nt and over 25 M reads per sample 16 .

CAGE data alignment, annotation, and classi cation
Sequenced reads were checked for quality with FASTQC 17 , then trimmed for quality and length with fastx_trimmer (-Q33), and adapters removed with trimmomatic 18 .Reads mapped to human ribosomal RNA locus (U13369.1)were removed.Clean reads were mapped on hg38 primary chromosomes with Burrows-Wheeler Aligner 19 , and unmapped reads were remapped with HISAT2 20 .Mapped reads were converted into CAGE transcriptional start sites (CTSS) using CAGEr Bioconductor package (sequencingQualityThreshold = 10, mappingQualityThreshold = 20, removeFirstG = TRUE, correctSystematicG = TRUE) to count 5' end of the mapped CAGE reads at single base-pair resolution 21 .Library sizes were de ned by the total number of mapped CAGE tags in output CTSS les.
To compensate for the heterogeneous pro le of the CTSS clusters, the decomposition peak identi cation (DPI) program, which relies on the independent component analysis, was used to de ne a set of reference peaks for all samples 5 .This protocol produces permissive (at least 3 mapped reads to the same location) and robust DPIs (tag per million > 1).In the subsequent analysis, we used robust DPIs only.DPIs were linked with genes through a custom-developed CAGE_peak_annotation package (+/-500 bp rule) and ChIPseeker package in R 22 .Databases for annotation included Augustus 23 , Genscan 24 , NCBI RefSeq 25 , and Gencode v37 from Gencode 26 and UCSC Table Browser 27 .Bidirectional enhancers were called and linked to genes as described by Andersson et al. 8 .Brie y, divergent CAGE TSS pairs were selected by max separation distance of 400 bp, merged into bidirectional pairs with anking windows added (200bp), ltered by directionality score, and masked by Gencode promoter and exon regions, +/-500 and 200 bp, respectively.TomeTools TSSClassi er (http://tometools.sourceforge.net/)was used to classify DPI and bidirectional enhancers into promoters and enhancers.As control datasets, we selected eukaryotic promoter database (EPD) 28 , reference TSS (refTSS) 29 , ENCODE sequences such as promoter-like sequences (PLS), proximal enhancer-like sequences (pELS), distal enhancer like sequences (dELS) 30 , and FANTOM5 enhancers, then applied classi er on 2000bp regions of DPI and bidirectional clusters.Optimal speci city/sensitivity thresholds were used for classi cation.To get consensus regions DPI clusters and bidirectional enhancers were extended up to 400bp (300 upstream, 100 downstream for DPI, and +/-200 bp for bidirectional enhancers) and collapsed if they had any overlap.Consensus regions were de ned as the highest peak in the 400 bp region belonged to EPD / refTSS / PLS but not to pELS / dELS was classi ed as a promoter, in the opposite case -as an enhancer, and the rest -ambiguous (Supplementary Figure 1A).This classi er was adjusted by the direct overlap with ENCODE regions.
Dinucleotide analysis was performed on DPI clusters centered on TSS.Classi cation of YR was given in the case of CA/CG/TA/TG and YC for CC/TC, the remaining motifs were called "other" (Supplementary Figure 1D).Vista enhancers 31 , super-enhancers 32 , and H. sapiens promoters EPD 28 were obtained from the related sources.UCSC LiftOver tool was applied when hg38 annotation was unavailable (Supplementary Figure 1E).
The WGCNA package 33 was applied for co-expression clusters identi cation on both experimental groups separately with soft thresholds 9 and 14 for groups one and two, respectively.
Overlaps and upset plots of cardiac regulatory elements from other data sources 12,34−37 were done with bedtools v2.28 comparing all heart CAGE clusters (Fig. 2, Supplementary Figure 4).We compared the entire set of robust DPI peaks, DPI peaks similar to EPD sequences (by TSSClassi er), and bidirectional enhancers with the FANTOM5 database (Supplementary Figure 5).
The number of unique genes and transcripts in the heart was counted by the association of gene models (RefSeq, Gencode, Augustus, Genscan) with DPI clusters by CAGE_peak_annotation. Genes or transcripts were considered active if associated with a CAGE cluster with 10 or more tag counts.Saturation curve built on random subsets of selected sizes from original BAM les with mapped CAGE reads.These subsets were clustered independently and connected to gene models to count numbers of genes/transcripts (Supplementary Figure 7A).
The heatmap of all heart CAGE samples was based on the Pearson correlation of TPM values for all CAGE clusters (DPI and bidirectional enhancers) (Supplementary Figure 8).

Epigenetic marker overlay with CAGE clusters
ATAC-seq data for the human heart was obtained from Broad Institute's Cardiovascular Disease Knowledge Portal 38 (Supplementary Figure 1B).Signal data for epigenetic marks of H3K4me3, H3K27ac, H3K4me1, together with PolR2A and DNase-seq, were obtained from the ENCODE project 30 (Supplementary Figure 1B-C, Supplementary Table 3).For the CAGE clusters which overlap with narrow ChIP-seq peaks, an additional rule was applied to assign class according to Jiang and Mortazavi 39 .RNA-seq data from GSE116250 and GSE128188 were aligned to the genome and clustered using cu inks 40 .Obtained transcripts were connected to DPI clusters according to the ±500bp rule.Precalculated RNA-seq data from Encode based on Gencode (v24 and v29) was associated with clusters and considered active if transcripts have nonzero FPKM values.Conservation scores phastCons from 100-way alignment and CpG islands regions were accessed from UCSC for hg38 (Fig. 2).Fractions of signals were counted using bigWigToBedGraph and bedtools v2.28 (map, groupBy) with a 5 bp window.
De novo transcription factor binding site identi cation for cluster validation Regulatory sequence motifs we analyzed using Match 41 for known TFBS and cisExpress 42,43 for de novo discovery (Supplementary Figure 6).Match tool had parameters of matrix similarity with at least 0.95 and core similarity of 1 to calculate the distribution of TFBS density.The clustering of the TBFS was conducted based on the positional density.First, position-speci c density pro les were calculated for each TFBS in the TRANSFAC database and the rare motifs (those that appear less than in 1/10 of the analyzed sequences) were excluded.Then, Pearson's correlation coe cients were computed between densities of motifs in the window of 10 nucleotides.Hierarchical clustering was achieved with the Ward method and (1-correlation)/2 as a distance using hclust 44 .Kruskal-Wallis test, and the Wilcoxon rank-sum test with continuity correction allowed to assess the signi cance of pairwise differences between clusters of TFBS.
The cisExpress algorithm enabled de novo motif discovery.This approach is based on the calculation of the Z-score, showing the in uence strength of having a nucleotide sequence w in the positional window k.The window size was selected based on the quality of the genome annotation and properties of the organism and the experiment.
where e with (w,k) and e without (w,k) were the average gene expression values with and without the word w in the position k; Stdev with (w,k) and Stdev without (w,k) were the standard deviations of gene expression values; n with (w,k) and n without (w,k) were the numbers of sequences of genes containing and not containing word w in the k th window.Words with Z-scores above a prede ned threshold were stored as primary motifs.Groups of similar motifs discovered within one window were merged, resulting in longer and/or more ambiguous motifs.This part of the algorithm produced the motifs in the form of consensus sequences and included the position window where it was discovered.

Differential expression analysis and pathway enrichment
Identi ed clusters were analyzed with edgeR for differential expression 45 .Dispersions were calculated with the GLM method, likelihood ratio tests were done with glmFit and glmLRT.Principle component analysis was used for sample dispersion visualization.The main comparison groups were 'healthy ventricle versus failing ventricle' and 'healthy atrium versus failing atrium' (Figure 3B-C).Other comparisons are shown in Supplementary Figure 11-13 were on scores calculated as z = (x-µ)/σ, where x is the log2|p-adjusted| (p-values adjusted with Benjamini-Hochberg correction and ltered to be p-adjusted < 0.05), µ is the mean, and σ is the standard deviation (Supplementary Table 4).

Transcription factor binding and single nucleotide polymorphism analysis
Heart genome-wide association study (GWAS) data with selected heart diseases for hg38 assembly was obtained from Ensembl BioMart (Supplementary Table 5).NHGRI-EBI GWAS catalog was used as another source of SNPs in our study 49 .Enrichment of the features from GWAS studies was made with enricher (clusterPro ler) for 400 (300 upstream, 100 downstream) and 1000 (900 upstream, 100 downstream) bp regions (Supplementary Figure 15A).Same heart CAGE anking regions were used for Fisher's exact test on the selected features (Supplementary Figure 15B).
Percentage of heart GWAS SNP overlapping different genomic regions (heart CAGE DPI -upstream 300 bp, downstream 100 bp, bidirectional enhancers center +/-200 bp) was counted for each chromosome separately (Figure 5).The fraction was estimated in 10M windows with 1000 bootstrap.
Local TFBS enrichment analysis was performed using MEME-Suite CentriMo with JASPAR2018 Pol2 motifs on 400 bp sequences and default parameters for all heart CAGE DPI clusters 50 .SNP effect on TFBS was found with motifbreakR 51 by processing a list of enriched motifs with heart GWAS SNPs located in 400bp regions (Figure 6, Supplementary Table 6).
Differentially DPI clusters were ltered to avoid overlap with exons, then extended to 400bp (300 upstream, 100 downstream).In case of overlap with another cluster, the cluster with a higher number of tag counts was selected.Sequences of selected regions were submitted to MEME Suite 5.3.3AME with default parameters and Jaspar 2018 CORE non-redundant vertebrate motifs.Bonferroni corrected p-values (padjusted < 0.05) were transferred to log scale and used for score calculation and clustering (Supplementary Table 4).

Online access to atlas with Zenbu reports platform
All heart CAGE data, annotations, and analysis results were integrated using the Zenbu reports platform ZENBU 3.0.1,which is provided here as an open-source database (https://fantom.gsc.riken.jp/zenbu/reports/#Atlas%20of%20cardiac%20promoters%20and%20enhancers) 52.The database contains separate tracks for permissive and robust DPIs, bidirectional enhancers, and classi ed promoters and enhancers.Results of differential expression are presented in individual tracks with highlights of statistical signi cance at FRD < 0.05.A comprehensive manual for the database usage is available in Supplementary Document 1.
All heart clusters, sample descriptions, and detailed annotations are available on NCBI GEO portal (accession number GSE150736) and in Supplementary Table respectively.

Declarations Figures
Atlas of transcribed CAGE clusters of healthy and failing human hearts.A. Experimental design, showing numbers of samples, donors, hearts, sources of samples.B. Sequenced reads were trimmed by quality, against rRNA and adapters.In total >109 reads were aligned to the genome with an average mapping ratio of 97.7%.C. Majority of reads were aligned into promoter regions of annotated genes and transcripts by Gencode v37 (on average 7.4M per library), a second large portion of reads was mapped to intergenic regions (~0.9M per library) and likely associated with enhancer activity.D. We applied two protocols of CAGE signal clustering using FANTOM5 pipelines to annotate putative promoters and enhancers.De ned clusters were classi ed using machine learning on reference regions including Encode, FANTOM5, and EPD.Then, clusters were extended and merged into consensus regions -predicted heart promoters and enhancers.E.
Total amount of DPI and bidirectional enhancers clusters, transcripts based according to overlap with annotation databases, and total transcriptionally active genes in the human heart.F. Example view showing an expressed bidirectional enhancer connected to DPI CAGE peaks.All peaks with related classi ers are available in Supplementary Table 2 and Zenbu reports platform.

Figure 4 Comparison
Figure 4

Figure 5 Heart
Figure 5