DNA methylation biomarkers of myocardial infarction and cardiovascular disease

The epigenetic landscape underlying cardiovascular disease (CVD) is not completely understood and the clinical value of the identified biomarkers is still limited. We aimed to identify differentially methylated loci associated with acute myocardial infarction (AMI) and assess their validity as predictive and causal biomarkers. We designed a case–control, two-stage, epigenome-wide association study on AMI (ndiscovery = 391, nvalidation = 204). DNA methylation was assessed using the Infinium MethylationEPIC BeadChip. We performed a fixed-effects meta-analysis of the two samples. 34 CpGs were associated with AMI. Only 12 of them were available in two independent cohort studies (n ~ 1800 and n ~ 2500) with incident coronary and cardiovascular disease (CHD and CVD, respectively). The Infinium HumanMethylation450 BeadChip was used in those two studies. Four of the 12 CpGs were validated in association with incident CHD: AHRR-mapping cg05575921, PTCD2-mapping cg25769469, intergenic cg21566642 and MPO-mapping cg04988978. We then assessed whether methylation risk scores based on those CpGs improved the predictive capacity of the Framingham risk function, but they did not. Finally, we aimed to study the causality of those associations using a Mendelian randomization approach but only one of the CpGs had a genetic influence and therefore the results were not conclusive. We have identified 34 CpGs related to AMI. These loci highlight the relevance of smoking, lipid metabolism, and inflammation in the biological mechanisms related to AMI. Four were additionally associated with incident CHD and CVD but did not provide additional predictive information.


Introduction
Cardiovascular disease (CVD) and more specifically coronary heart disease (CHD) remains the number one cause of death and disease burden worldwide [1,2]. At the individual level, prevention is based on the estimation of cardiovascular risk [3]. However, the sensitivity of cardiovascular risk estimation is low and a significant proportion of CHD events occurs in individuals classified as having moderate or low risk [4]. Additionally, the use of currently available drugs to control classical cardiovascular risk factors (CVRFs) does not prevent all CHD events, underlining the need to identify new strategies for reducing this residual cardiovascular risk [5]. Thus, information encoded in biological mechanisms should be unravelled to find new predictive biomarkers and potential therapeutic targets. Among these biomarkers, DNA methylation marks arise as emerging candidates.
DNA methylation is an epigenetic mechanism consisting on chemical modifications of cytosines, mostly followed by guanines (CpGs) [6]. Epigenome-wide association studies (EWASs) make it possible to find DNA methylation biomarkers of different traits and outcomes. In fact, DNA methylation pattern is associated with multiple chronic diseases [7], including CVD and CHD [8][9][10][11][12][13]. However, the clinical value of the identified biomarkers is still limited, and the epigenetic landscape underlying CVD is not completely understood.
The most common technology to assess DNA methylation is based on commercial arrays, which do not cover the whole methylome. Moreover, most current knowledge on the relation between DNA methylation and cardiovascular risk comes from studies based on the Infinium HumanMeth-ylation450 BeadChip (Illumina, CA, USA; from now on, 450 k) [14] -which has been replaced by the Infinium Meth-ylationEPIC BeadChip (Illumina, CA, USA; from now on, EPIC). Compared to the 450 k, EPIC interrogates 413,745 more methylation sites (but excludes 42,859) increasing the genomic coverage. Moreover, EPIC is enriched with functional sites analyses such as enhancers, DNase hypersensitive sites, and miRNA promoter regions [15]. Thus, the new chip has the potential to identify novel DNA methylation-based biomarkers of cardiovascular events.
We hypothesized that DNA methylation is associated with MI risk, and that some of these epigenetic marks could be predictive of future risk, and have causal effects on cardiovascular outcomes. Thus, this study had three aims: 1) to unravel genomic methylation loci associated with myocardial infarction (MI), 2) to assess their predictive capacity of cardiovascular risk, and 3) to decipher the causality of those associations.

Quality control of DNA methylation data, cardiovascular outcomes and covariates
We finally included 391 individuals (196 cases and 195 controls) in the REGICOR-1 sample (Girona Heart Registry; REgistre GIroní del COR), 204 individuals (101 cases and 103 controls) in the REGICOR-2 sample, 1,863 women in the WHI Women's Health Initiative) sample, and 2,540 participants in the FOS (Framingham Offspring Study) sample. The main sociodemographic and clinical characteristics of the three populations are shown in Tables 1 and  2. Regarding the number of CpGs, we analysed 811,610 CpGs in the REGICOR-1 sample, 820,183 CpGs in the REGICOR-2 sample, 478,369 CpGs in the WHI sample, and 483,656 CpGs in the FOS sample. Figure 1 illustrates the steps included in this study.

Two-stage EWAS on acute myocardial infarction
Discovery stage The associations from the discovery stage (REGICOR-1) that were taken to the subsequent validation (p-value < 10 -5 ), and their Manhattan and Q-Q plots are shown in the Additional file 2: Table S1, and Additional file 1: Figs. S1 and S2]. In total, we identified 68 CpGs suggestively related to MI (Additional file 1: Fig. S3). Model 1 provided 56 CpGs, of which three were also found in both model 2 and 3, and 13 in model 2. One additional CpG was found in both model 2 and 3, two in model 2 and nine in model 3.

Validation and meta-analysis
The association studies performed in the validation stage included the 68 CpGs suggestively related to MI. We meta-analysed the results of those 68 associations from both stages. We identified 34 differentially methylated CpGs related to MI, with similar effect sizes in all three models for most of the CpGs   Table 3, and Additional file 2: Table S2).

Follow-up association studies on incident CHD and CVD events
Out of the 34 identified CpGs associated with MI, only 12 were available in the samples with incident cases (whose DNA methylation was profiled with the array 450 k). In total, we validated four CpGs after the meta-analysis of the separate association studies in the WHI and the FOS samples (p-value < 0.05/12 = 4.17 × 10 -3 ): AHRR-mapping cg05575921, PTCD2-mapping cg25769469, intergenic cg21566642 and MPO-mapping cg04988978. The four CpGs were associated with CHD but cg25769469 was not related to CVD (Table 4, Additional file 2: Table S3).

Association between MRSs and incidence of CHD and CVD
The associations between the methylation risk scores (MRSs) and the incidence of coronary (n = 94) and cardiovascular (n = 222) events in the FOS population are shown in Additional file 2: Table S4. The median of the follow-up periods for CVD and CHD incidence were 7.67 and 7.87 years, respectively. The MRSs were not associated with higher cardiovascular risk independently of the classical CVRFs. Consistently, the addition of any of the MRSs to the Framingham risk function did not improve its predictive capacity in the FOS cohort (Additional file 2: Table S4).

Causality of the associations between DNA methylation and cardiovascular outcomes
Of the four identified CpGs, only cg21566642 showed a genetic influence; its methylation levels in adolescence were associated with rs72617176 and in childhood with rs139595493. We did not have individual data to test the first and second Mendelian randomization assumptions, but the meQTLs were associated with the CpGs methylation levels at genome-wide significance independently of age, sex or ancestry principal components [16]. Only the Wald ratio method could be conducted, since it uses a single instrumental variable. The results did not support a causal effect of methylation at cg21566642 on either MI or CHD (Additional file 2: Table S5). We could not perform sensitivity tests for pleiotropic effects or its strength. The other three CpGs could not be instrumented.

Discussion
We have identified 34 methylation loci associated with acute MI in a two-stage EWAS, analysing ~ 850,000 CpGs. All but two of these MI-associated sites (cg05575921 located in AHRR and the intergenic cg21566642) are newly reported. Of those, 12 CpGs could be studied in association with incident cases of CHD and CVD, and we identified four of them associated with incident CHD (three of them also with incident CVD). All four were also related to traditional CVRFs, supporting their role in the development of these diseases. However, their clinical utility as predictive biomarkers or drug targets was not proven.
Recently, two EWASs on incident CHD were published providing different findings from ours. Ward-Caviness et al.found nine CpGs associated with incident acute MI [9]. Agha et al. reported 52 CpGs related to incident CHD [8]. None of them was replicated in our study. This lack of concordance could be related to methodological differences (incident vs prevalent cases; myocardial infarction vs CHD; considered confounder variables; characteristics of the populations), and highlights the complexity of the study of these diseases.

CpG sites associated with acute MI events
The 34 identified CpGs showed similar effect sizes in the two REGICOR samples and we considered them potentially relevant. Similarly, all but three CpGs (AHRRmapping cg05575921, F2RL3-mapping cg03636183, and the intergenic cg21566642) showed consistent effect sizes in the three models. The effect size of those three was reduced by half when adjusted for smoking, which highlights the important role of this risk factor in the MI context. In fact, all three sites are widely described to be related to smoking [17][18][19].
Nonetheless, the case-control design of our initial discovery sample limits the inference of the biological sequence of the epigenetic marks, the related biological mechanisms, and the clinical event. One possible scenario could be that the identified DNA methylation marks occurred before the acute event, as potential biological mechanisms involved in MI pathogenesis. This may be the case of the three CpGs that were related to smoking. Conversely, as blood samples of MI cases were collected within the initial 24 h after hospitalization, the other possibility could be that methylation at the identified CpGs had changed as a consequence of the acute event or the therapeutic procedures. If the first scenario can be proven in further studies, these DNA methylation marks could be potential predictive biomarkers of MI or new therapeutic targets. If they are found to be post-MI marks, further studies could evaluate their potential as biomarkers of prognosis.

CpG sites consistently related to prevalent and incident CVD events
Twelve of the 34 identified CpGs could be evaluated in prospective samples and four of them were also related to incident cases of CHD. cg21566642 maps to an intergenic region, and cg05575921, cg04988978 and cg25769469 annotate to AHRR, MPO and PTCD2, respectively. To our knowledge, these CpGs were not associated with cardiovascular events in previous EWAS reports. cg21566642 and cg05575921 were highly and inversely associated with smoking, which is supported by previous EWAS [18,19]. We have also previously reported both CpGs as related to age-independent cardiovascular risk [13], and they have been related to all-cause mortality in an EWAS [22]. cg05575921 was further associated directly with cholesterol in high-density lipoproteins (HDL-C) and inversely with cholesterol in low-density lipoproteins (LDL-C) and triglyceride levels in our study. This CpG has been related to both CHD prevalence and incidence in a candidate gene study [23].
cg04988978 and cg25769469 annotate to MPO and PTCD2, respectively. Both CpGs were associated directly with HDL-C and inversely with triglyceride and glucose levels. MPO encodes the myeloperoxidase, which promotes atherosclerotic lesions by enhancing APOB oxidation within low-density lipoproteins [24] and was causally associated with incident cardiovascular outcomes [25]. One CpG located within PTCD2 was previously identified to be associated with hypertension in obstructive sleep apnea patients [26], and genetic variants in this gene have been related with blood pressure [21].

MRSs as predictive CVD biomarkers
To assess the value of the four identified CpGs as predictive biomarkers, we followed the AHA recommendations [27]. However, neither we observed an independent association between the MRSs and the incidence of CVD events in the FOS, nor we observed an improvement in the predictive capacity of the Framingham risk function when including this score. This highlights the challenge of novel biomarkers to improve cardiovascular risk prediction.

Causality of the associations between methylation loci and cardiovascular outcomes
The four CpGs associated not only with acute MI, but also incident CHD, may suggest that DNA methylation changes at those loci occur prior to the event. However, this association does not guarantee whether differential DNA methylation at those loci has a causal effect on CHD. Mendelian randomization can be used to ascertain this causal relationship. However, this approach could only be undertaken for cg21566642. Although a non-causal relationship was suggested, this must be interpreted with caution as there was a single genetic instrumental variable, and we cannot discard that the meQTL is in linkage disequilibrium with the causal variant for CHD, reverse causation or horizontal pleiotropy using this framework [28,29]. Moreover, cg21566642 showed a genetic influence in childhood and adolescence, while CHD events typically occur during adulthood.

Strengths and limitations
The main strength of our study is that it is the first twostage EWAS on MI to be based on more than 800,000 CpGs across the genome. Moreover, we aimed to validate our findings in prospective samples of CHD and CVD as a proxy of MI. Also, we aimed to prove the clinical relevance of our findings. However, some limitations should be acknowledged. First, two thirds of the CpGs identified in the initial case-control study could not be assessed in the incident studies as the methylation arrays differed in the number of CpGs (EPIC VS 450 k, respectively). Second, we used self-reported information about cardiovascular risk factors in the case-control study, as an event such as MI modifies risk factor levels during the acute phase. Third, we cannot infer causality since changes in methylation could have occurred as a consequence of the acute phase and disease management of the MI event. We aimed to perform MR studies of the association between the identified CpGs and cardiovascular events, but available methylation Quantitative Trait Loci (meQTL) datasets are still limited. Last, our study is based on populations of European origin and the results cannot be extrapolated to other populations.

Conclusion
Our study provides 34 novel DNA methylation loci related to MI. The results shed some light on the molecular landscape of MI, highlighting the importance of traditional CVRFs and inflammation in the development of CHD. Our results question the relevance of DNA methylation as a predictive biomarker.

Study design and populations
We designed an EWAS using three populations: the Girona Heart Registry (REGICOR, REgistre GIroní del COR), the Women's Health Initiative (WHI), and the Framingham Offspring Study (FOS). We first performed a two-stage EWAS on acute MI using two independent age-and sex-matched case-control studies designed in REGICOR. Then, we validated the results in the other two populations with incident cases of CHD and CVD.

Case-control studies of acute MI in REGICOR
The sample used in the discovery stage (REGICOR-1) involved 416 individuals (208 MI cases and 208 controls). The sample in the validation stage (REGICOR-2) comprised 208 individuals (104 cases and 104 controls). Cases were selected from patients who were consecutively attended for a first acute MI in the reference hospital of the monitored area, in the province of Girona, in the northeast of Spain. Women were overrepresented to achieve their inclusion as 50% of our sample. Controls were participants in a population-based survey performed in the same monitored area. They were randomly selected from those attending the 2009-2013 follow-up visit (n = 4980), and matched by age and sex with the MI cases. All participants were of European descent and provided informed written consent. The study was approved by the local ethics committee (2015/6199/I; 2018/7855/I) and meets the principles expressed in the Declaration of Helsinki and the relevant Spanish legislation.

Samples with incident cases of CHD and CVD
The WHI sample is a case-control study nested in a cohort. The FOS sample is a prospective cohort study. Both samples were available in the database of Genotypes and Phenotypes (http:// dbgap. ncbi. nlm. nih. gov; Project Number #9047). The graphical abstract shows the design and flow-chart of this study.

Assessment of cardiovascular outcomes
The outcomes assessed were acute MI in REGICOR, and incident CHD and CVD in the WHI and FOS samples. Additional details are provided in the Additional file 1: Methods.

Assessment of DNA methylation
DNA methylation was assessed genome-wide from peripheral blood with commercial arrays from Illumina (CA, USA). The Infinium MethylationEPIC BeadChip, covering over 850,000 CpGs, was used in the REGICOR samples. The Infinium HumanMethylation450 Bead-Chip, covering over 480,000 CpGs, was used in the WHI and FOS samples. A detailed quality control pipeline for the methylation data is available in the Additional file 1: Methods. Methylation status at each CpG was reported by β-values [30].

Covariates
In the REGICOR case-control studies the following covariates were considered: self-reported smoking, diabetes, hypercholesterolemia and hypertension (Additional file 1: Methods). In the WHI and FOS studies self-reported smoking and glycaemia, total and HDL cholesterol, and blood pressure measurements were considered. Moreover, we inferred the peripheral blood cell counts with the FlowSorted.Blood.450 k R package [31]. We also estimated two surrogate variables for unknown sources of potential technical or biological confounding using the sva R package [32].

Statistical analysis
All statistical analyses were performed using R version 3.4.0. The codes of the Singularity images used to run the EWASs in the high performance computing system of the Hospital del Mar Medical Research Institute are available in the repositories at https:// github. com/ regic or/ methy lation_ ami/. A detailed description of the statistical methods is provided in the Additional file 1: Methods.

Association between DNA methylation and cardiovascular outcomes
Logistic regression was used in the analyses in the REGI-COR and WHI samples, while Cox regression was used in the FOS sample. We considered the cardiovascular event (acute MI, CHD or CVD) as the outcome and DNA methylation as the exposure. We defined three models. Model 1 was adjusted for estimated cell counts and two surrogate variables (plus age and ethnicity in the WHI sample, plus age and sex in the FOS samples). Model 2 was additionally adjusted for smoking. Model 3 was further adjusted for diabetes, hypercholesterolemia and hypertension.
In order to reduce epigenomic inflation, we corrected the coefficients, the standard errors and the p values using the bacon R package if necessary [33]. The bacon R package controls for bias and inflation using a Bayesian method based on the estimation of the empirical null distribution and was used in previous EWAS [33][34][35]. We used coefficients and standard errors from the regression models as the input data and we set a random seed at 123.
We selected those associations from the discovery stage (REGICOR-1) with a corrected p-value < 10 -5 for assessment in the validation stage (REGICOR-2). Moreover, we performed a fixed-effect meta-analysis of the corrected effect sizes observed in both stages, weighted by the inverse of the variance. Thereafter, we studied the association of the identified CpGs with incident CHD and with CVD events in the WHI and the FOS samples, separately. The results from both samples were meta-analysed (for CHD and CVD, separately). We used the Bonferroni criteria to correct for multiple comparisons (0.05 divided by the number of probes analysed in each specific analysis).

Association between the identified CpGs and CVRFs
We analysed whether the methylation levels of the identified CpGs were associated with individual CVRFs in the four samples using multiple linear regression, and then meta-analysed the results. We defined DNA methylation as the outcome and adjusted for age and sex in the case of the REGICOR and the Framingham populations, and for age and ethnicity in the WHI sample. In the case of the REGICOR samples, the continuous variables were only available for the control individuals. We meta-analysed the results from the four populations using a fixed-effects meta-analysis weighted by the inverse of the variance. The p value threshold was estimated as 0.05 divided by the multiplication of the number of CVRFs and the number of CpGs assessed.

Methylation risk scores (MRSs) and predictive capacity
We developed two weighted MRSs based on the CpGs identified, each of them using the results from the metaanalyses of incident CHD or CVD, respectively. We evaluated the association between these scores and CHD and CVD incidence, respectively, in the FOS sample, using Cox regression. All analyses were adjusted for age, sex, diabetes, smoking, systolic blood pressure, hypertensive treatment, and levels of total cholesterol and HDL-C [36]. We also assessed the potential added predictive value of including the MRSs in the Framingham risk function. We evaluated the increase in the discrimination and the reclassification.

Causality of associations between DNA methylation and cardiovascular outcomes
We took a two-sample Mendelian Randomization studies using the MR-Base platform [37]. We used the MRInstruments R package to select the instrumental variables, and then, the TwoSampleMR R package. First, we considered those methylation-level quantitative trait loci (meQTL) from the Accesible Resource for Integrated Epigenomic Studies (ARIES) project [16] included in the MR-Base database [37]. Then, we interrogated their association with MI and with CHD using summary statistic data from a meta-analysis of GWAS on CHD [38]. A more detailed description of the analysis is included in the Additional file 1: Methods.