Blood DNA methylation sites predict death risk in a longitudinal study of 12,300 individuals

DNA methylation has fundamental roles in gene programming and aging that may help predict mortality. However, no large-scale study has investigated whether site-specific DNA methylation predicts all-cause mortality. We used the Illumina-HumanMethylation450-BeadChip to identify blood DNA methylation sites associated with all-cause mortality for 12, 300 participants in 12 Cohorts of the Heart and Aging Research in Genetic Epidemiology (CHARGE) Consortium. Over an average 10-year follow-up, there were 2,561 deaths across the cohorts. Nine sites mapping to three intergenic and six gene-specific regions were associated with mortality (P < 9.3x10-7) independently of age and other mortality predictors. Six sites (cg14866069, cg23666362, cg20045320, cg07839457, cg07677157, cg09615688)—mapping respectively to BMPR1B, MIR1973, IFITM3, NLRC5, and two intergenic regions—were associated with reduced mortality risk. The remaining three sites (cg17086398, cg12619262, cg18424841)—mapping respectively to SERINC2, CHST12, and an intergenic region—were associated with increased mortality risk. DNA methylation at each site predicted 5%–15% of all deaths. We also assessed the causal association of those sites to age-related chronic diseases by using Mendelian randomization, identifying weak causal relationship between cg18424841 and cg09615688 with coronary heart disease. Of the nine sites, three (cg20045320, cg07839457, cg07677157) were associated with lower incidence of heart disease risk and two (cg20045320, cg07839457) with smoking and inflammation in prior CHARGE analyses. Methylation of cg20045320, cg07839457, and cg17086398 was associated with decreased expression of nearby genes (IFITM3, IRF, NLRC5, MT1, MT2, MARCKSL1) linked to immune responses and cardiometabolic diseases. These sites may serve as useful clinical tools for mortality risk assessment and preventative care.


INTRODUCTION
The human epigenome contains DNA methylation marks that progressively change as we age. DNA methylation can influence gene expression and manifests in response to both environmental and hereditary factors [1,2]. Biological age estimations, constructed from DNA methylation marks and referred to as "epigenetic aging clocks", have been associated with environmental exposures, morbidities, and mortality [9][10][11][12][13]. As these clocks were designed to track chronological age, not to predict mortality, further study is necessary to fully elucidate indicators of all-cause mortality. To date, no large-scale analysis has been conducted to identify variations in DNA methylation at individual 5'-cytosinephosphate-guanosine-3' (CpG) sites associated with future mortality risk. Here, we present an epigenomewide methylation analysis of 12,300 participants and 2, 561 (21%) deaths from 12 American and European cohorts to determine whether site-specific DNA methylation predicts all-cause mortality, independent of age, lifestyle factors, and clinical predictors of mortality including comorbidities. We also assessed the causal relationship of identified sites with age-related chronic diseases using Mendelian randomization approaches, and we related the sites to epigenetic aging clocks and a mortality risk score, an epigenetic indicator of mortality previously created and validated with DNA methylation arrays in two European cohorts.
AGING 257 FDR-significant (P < 3.03x10 -5 ) CpGs in a basic model adjusting for age, sex, technical covariates, and white blood cell proportions (Figure 2A and  Supplementary Table 2). We also identified three Bonferroni-significant and nine FDR-significant (P < 9.3x10 -7 ) CpGs in a fully-adjusted model also adjusting for education, smoking status, pack-years smoked, body mass index, recreational physical activity, alcohol consumption, hypertension, diabetes, and history of cancer and coronary heart disease (Figures 2B, 3A  and Supplementary Table 3). For 188 (73%) basicadjusted FDR-significant CpGs and six (67%) fullyadjusted CpGs, higher blood DNA methylation was associated with lower all-cause mortality ( Figure 2 and Supplementary Tables 2, 3).
Meta-analysis results did not appear to suffer from systematic bias due to unmeasured confounding, as assessed by genomic inflation (basic model: λ = 1.12,  Table 5). Cohort-specific inflation was also minimal, with lambdas close to one for most cohorts. Volcano plots showed symmetry in the direction of the associations with all-cause mortality ( Figure 2). All nine fully-adjusted FDR-significant CpGs showed low/medium heterogeneity (Supplementary Tables 7) and consistent magnitude of the estimated HRs across studies ( Figure 3A). We further validated our results by excluding cohorts with high proportion of deaths (30%) and inflation (λ > 1.5). In these sensitivity analyses, HRs for the nine FDR-significant CpGs were consistent with main results in terms of direction, magnitude, and statistical significance (Supplementary Figure 1 and Supplementary Tables 8,9).
Three of the nine fully-adjusted FDR-significant CpGs (cg20045320, cg07677157, cg07839457) were associated with lower incidence of coronary heart disease rates (P < Bonferroni threshold of 0.005) ( Figure 4 and Supplementary Table 10).

Miettinen's population attributable factor, epigenetic aging clocks, and mortality risk score
To assess the extent that methylation levels of each CpG predict all-cause mortality, we calculated Miettinen's population attributable fraction on data from the Normative Aging Study (NAS) and the Women Health Initiative-Epigenetic Mechanisms of Particulate Matter-Mediated Cardiovascular Disease (WHI-EMPC) for European and African American ancestries. DNA methylation levels above the average at each CpG predicted, individually and independently of other factors, 5%-15% of all deaths ( Figure 3C and Supplementary Table 11). In the same datasets, all nine CpGs were associated with age, cumulative smoking, body mass index, and physical activity (P < 0.05). Seven out of nine CpGs (cg17086398, cg14866069, cg23666362, cg20045320, cg7677157, cg07839457, cg09615688) had negative relationships with age (Supplementary Table 12). Seven CpGs were strongly associated with epigenetic aging clocks and mortality risk scores; all significant associations had the same direction and similar magnitude across the four epigenetic aging clocks (Supplementary Table 13), even if none of those sites was included in any of the clocks. Those CpGs had consistent and independent association with all-cause mortality when adjusted for epigenetic aging clocks and mortality risk scores (Supplementary  Tables 14, 15). In overall meta-analysis, we identified 57 out of 58 CpGs of the risk score, and those sites had low to moderate association with DNA methylation levels at our FDR-significant CpGs with a balance between  Figure 2A, 2B). In overall meta-analysis, the association between all-cause mortality and DNA methylation levels at the majority (34 out of 58) of mortality risk score CpGs had consistent direction with previous results. Among those CpGs, only two (cg25193885 and cg19859270) showed nominally significant association with mortality ( Supplementary Figure 2A, 2B).

Pathways analyses and DNA methylation integration with quantitative trait loci analysis (meQTL) and with gene expression (eQTM)
Extended genome-wide enrichment analysis showed that two of the CpGs (cg07839457 and cg17086398) mapped to genes (NLRC5 and SERINC2, respectively) previously associated with high-density lipoprotein cholesterol levels (FDR P = 0.02) and alcohol dependence (FDR P = 0.004) in genome-wide association studies (GWAS) analyses (Supplementary Table 16) [14]. We confirmed these results using Database for Annotation, Visualization and Integrated Discovery (DAVID) and KEGG, identifying and testing for enriched underlying biological processes in publicly available gene ontology databases (Supplementary Tables 17, 18).
To characterize the functional relevance of FDRsignificant CpGs, we performed covariate-adjusted methylation quantitative trait locus (meQTL) analyses using available unique single-nucleotide-polymorphism (SNP)-CpG combinations from 713 participants in the Cooperative Health Research in the Region Augsburg (KORA) study [15]. We identified nine Bonferroni-significant unique cis-regulatory polymorphisms associated with two 1000 bp-distant CpGs (cg09615688, cg18424841) (Supplementary Figure 3A and Supplementary Table 19). None of the nine identified polymorphisms overlapped with previous genetic results from the National Human Genome Research Institute-EBI GWAS Catalog (Supplementary Table 16).
We also evaluated expression quantitative trait methylation (eQTM) associations using 998 KORA participants. We identified three CpGs with FDRsignificant associations with decreased leukocyte expression levels of nearby genes, among the 13, 351 unique associations between gene-expression and DNA methylation levels at FDR-significant fully-adjusted CpGs. Namely, DNA methylation levels of cg07839457 (in NLRC5) were associated with NLRC5 expression as well as with that of a ~300 Mb-distant set of metallothionein (MT) 1 and 2 genes, which are linked to oxidative stress and immune responses [16,17]. DNA methylation of cg17086398 in SERINC2 was inversely associated with myristoylated alanine-rich C-kinase substrate like 1 (MARCKSL1) expression, which is involved in migration of cancer cells [18]. DNA methylation at cg20045320 in IFITM3 was associated with lower expression of IFITM3 and IRF, which have a critical role in immune responses (Supplementary  Figure 3B and Supplementary Table 20) [6,19].
We finally used functional mapping and annotation to examine tissue-specific expression. Genes identified in the fully-adjusted model showed universal expression at varying levels across tissues. IFITM3 was highly expressed in all tissues; BMPR1B showed low expression across all tissues, except for moderate expression in the prostate and tibial nerve. Remaining genes had moderate or low expression in a wide range of tissues, except for SERINC2, which showed high expression in the liver, kidney, salivary gland, and esophagus. MIR1973 was not represented in the dataset ( Figure 3D).

Mendelian randomization
To evaluate the causal relationship of FDR-significant CpGs to mortality-related risk factors and diseases, we included two sets of Mendelian randomization analysis using methQTL data from KORA and publicly available ARIES data. Only two FDR-significant CpGs AGING (cg18424841 and cg09615688) overlapped with methQTLs in either KORA or ARIES and with SNPs associated with coronary heart disease (CHD) or kidney function. A GWAS assessing longevity and age-related chronic diseases (CHD and kidney function) [34][35][36][37][38] showed no overlap with KORA and ARIES methQTLs even when using a moderate threshold for proxy variants (proxy r 2 > 0.75). In KORA, cg09615688 showed evidence of a positive causal effect on CHD (OR = 1.51; 95% CI = 1.02, 2.23; Wald ratio method), directionally consistent with the association of overall meta-analysis on mortality. However, this causal estimate at this site was not represented in ARIES methQTL data. Cg18424841 had multiple variants in KORA methQTL data and a single variant in ARIES methQTL data. We did not observe consistent evidence of a causal effect of cg18424841 on CHD. Indeed, weak evidence for a causal effect of cg18424841 on CHD was observed in ARIES using the Wald ratio method but not in KORA using pleiotropy-robust, multi-variant, or Wald ratio methods. We did not find evidence for a causal effect of cg18424841 on kidney function in either KORA or ARIES (Supplementary Table 21).

Cell-type fractions and all-cause mortality
Cell-type fractions, mostly neutrophil-lymphocyte ratio (NLR), have been often associated with comorbidities and mortality and have been recognized to influence DNA methylation levels [20][21][22]. We identified that NLR was significantly associated with all-cause mortality only when data were not adjusted for Houseman cell proportions using NAS data (Supplementary Table 22). Interestingly, NLR had no significant association with all-cause mortality when we adjusted for DNA methylation levels at cg07839457, mapped to immune-related gene NLRC5. However, the contribution of NLR on mortality at that specific site may be minimized due to adjustment of prior history of cancer and comorbidities in all models.

DISCUSSION
This study is the largest to date investigating sitespecific DNA methylation and all-cause mortality. We identified new whole blood DNA methylation marks that predict all-cause mortality risk, independent from chronological age, lifestyle habits, and morbidity. These newly identified sites may be useful in developing clinical tools for risk assessment and mortality preventive intervention strategies.
All nine FDR-significant CpGs demonstrated novel association with all-cause mortality and were not part of epigenetic aging clocks or mortality risk scores [9,[11][12][13]. Further, the CpGs were associated with mortality independent from epigenetic aging and mortality signatures. All-cause mortality was associated with a mortality risk score in a model including seven FDRsignificant CpGs, although those associations may be driven by the inclusion of CpGs related to our FDRsignificant sites. This suggests that whole blood DNA methylation levels at FDR-significant CpGs may be sentinels for epigenetic disruptions leading to aging acceleration and contributing to mortality. In addition, the association between DNA methylation levels at FDR-significant CpGs with chronological aging may suggest that those CpGs are stronger independent biomarkers of aging than other epigenetic aging signatures.
In previous CHARGE meta-analyses [3,4], DNA methylation of two of the newly-identified CpGs, cg20045320 and cg07839457 (mapping to interferon induced transmembrane protein 3 [IFITM3] and NLRC5) were respectively associated with smoking and cardiovascular-related chronic inflammation, both factors of mortality. Cardiovascular disease, especially CHD, is a major contributor to mortality [23]. The direction of association with incident heart disease was consistent with that of all-cause mortality. Thus, DNA methylation at these CpGs may contribute to development and progression of CHD and, consequently, to risk of death. To validate this idea, we used a Mendelian randomization approach and identified one site, cg09615688, with a causal effect on CHD in KORA data and weak evidence for the causal effect of cg18424841 on CHD in ARIES data.
Expression of several genes mapped to the fullyadjusted FDR-significant CpGs has been associated with mortality predictors and mortality. Elevated and persistent gene expression levels of NLRC5, a master regulator of the immune response [16], has demonstrated an inverse correlation with familial longevity and mortality predictors, such as elevated blood pressure, arterial stiffness, chronic levels of inflammatory cytokines, metabolic dysfunction, and oxidative stress [5]. In addition, expression of IFITM3 provides an essential barrier to influenza A virus infection in vivo and in vitro. Absence of IFITM3 leads to uncontrolled viral replication and a predisposition to morbidity and subsequent mortality [6]. Further, expression of BMPR1B enhances cancer cell migration, and approaches targeting BMPR1B inhibit metastatic activity in breast cancer [7]. Finally, expression of MIR1973, part of a family of microRNAs, increases resistant lung adenocarcinoma cells, with subsequent low apoptosis intensity [8]. This body of evidence may suggest an active role of DNA methylation levels in regulating relevant gene expression and reducing allcause mortality risk.

AGING
The overall meta-analyses included 12 cohorts with varying biological age and mortality. There was a balance between six studies with long (≥10 years) and six cohorts with short (<10 years) average time to follow-up or death. All cohorts showed consistency in magnitudes and directionality for the association with mortality of four CpGs (cg12619262, cg20045320, cg07839457, cg18424841). Two studies (FHS study 1 and KORA) showed non-significant opposing directionality when compared with the rest of the cohorts for several CpGs (FHS-Study 1: cg14866069, cg23666362, cg09615688; KORA: cg17086398, cg14866069, cg23666362). However, both cohorts had among the shortest average time-to-death (FHS-Study 1: 6.1 years; KORA: 4.4 years) and youngest average population age (FHS-Study 1: 65 years; KORA: 61 years). Both cohorts also had limited contribution in our meta-analysis due to reduced number of deaths (FHS-Study 1: 62; KORA: 42). Our results may indicate that DNA methylation levels at these select CpGs were relevant for mortality risk prediction of longer time-todeath in both adults and older-age adults.
Cell-type fractions, including NLR, as related to cancer and systemic inflammation have been related to mortality in different populations [20][21][22]. When we excluded Houseman cell proportions, NLR was strongly associated with mortality at all CpGs except cg07839457, which is mapped to the immune-related gene NLRC5. This may suggest that the contribution of NLR on mortality is minimized when controlled for prior history of cancer and related comorbidities.
In summary, we identified nine CpGs with a novel association with all-cause mortality, responsive to several external stimuli including alcohol consumption and smoking, and more than 10 years before death. These sites thus may be considered sentinels for epigenetic disruptions leading to age-related disease, such as cardiovascular disease, and contributing to mortality. Further studies have to confirm these associations in other tissues and in different populations.

Participating cohort studies
Our meta-analysis included 12, 300 participants from 12 population-based cohorts of the Heart and Aging For each participant, we derived years of follow-up using time between the blood draw used for DNA methylation analysis and death or last follow-up. Each cohort excluded participants with diagnosed leukemia (ICD-9: 203-208) or undergoing chemotherapy treatment, which both modify bloodderived data [24,25]. All participating cohorts shared cohort descriptive statistics and results files from prespecified in-house mortality analyses ( Figure 1). Further information about death ascertainment, covariates measurement and harmonization, protocols, and methods of each cohort are included in the Supplemental Materials. The institutional review committees of each cohort approved this study, and all participants provided written informed consent. Data and analytical codes that support our findings are available from the corresponding author upon request.

Blood DNA methylation measurements and quality control
Each cohort independently conducted laboratory DNA methylation measurements and internal quality control. All samples underwent bisulfite conversion via the EZ-96 DNA Methylation kit (Zymo Research) and were processed with the Illumina Infinium HumanMethylation450 (450K) BeadChip (Illumina) at Illumina or in cohort-specific laboratories. Quality control of samples included exclusion on the basis of Illumina's detection P-value, low sample DNA concentration, sample call rate, CpG specific percentage of missing values, bisulfite conversion efficiency, gender verification with multidimensional scaling plots, and other quality control metrics specific to cohorts. Each cohort used validated statistical methods for normalizing methylation data on untransformed methylation beta values (ranging 0-1). Some cohorts also made independent probe exclusions. Further details are provided in the Supplemental Material. For metaanalysis, additional probe exclusions were made across all cohorts. In detail, we also excluded control probes, non-CpG sites, probes that mapped to allosomal chromosomes, cross-reactive CpGs, probes with underlying SNPs within 10 bp of the CpG sequence, non-varying CpGs defined by interquartile range of <0.1%, CpGs with ≥10% of missing information, and CpGs with non-converging results [26][27][28]. We included only CpGs that were available in more than three cohorts. A total of 426, 724 CpGs were included in the meta-analysis (Supplementary Table 5).

AGING
The official gene name of each CpG site was noted via Illumina's genome coordinate. We used the name provided by Illumina with the UCSC Genome Browser and annotation data in Bioconductor. All annotations use the human February 2009 (GRCh37/hg19) assembly.

Cohort-specific statistical analyses
Each cohort independently ran a common pre-specified statistical analysis in R.version 3.5.1. We estimated the association between locus-by-locus blood DNA methylation levels and all-cause mortality in each cohort using a Cox-regression model. Proportional hazard assumptions were confirmed for each model in all cohorts. Familial relationship was also accounted for, when appropriate, in the model; FHS analyses included cluster for family structure, and TwinsUK analyses used random intercepts for zygosity and family structure. To avoid non-convergent results, cohorts with low deaths (KORA and TwinsUK) used a two-step analysis, in which covariates were first linearly regressed on each probe, and then residuals were used to perform a Cox mortality analysis.
Each cohort adjusted for harmonized covariates in the basic model: age (categories for decades), sex, and technical covariates (plate, chip, row, and column). A second set of fully-adjusted analyses adjusted for this initial list of covariates in addition to education level, self-reported recreational physical activity, smoking status, cumulative smoking (pack-years), body mass index, alcohol intake, hypertension, diabetes, and any personal history of cancer. Cohorts independently estimated cell type proportions using the reference-based Houseman method, which was subsequently extended by Horvath. Cell type correction was applied by including estimated cell type proportions (CD4T, NK cells, monocytes, granulocytes, plasma B cells, CD8T naïve, and memory and effector T cells) as covariates in cohortspecific statistical models. Each cohort underwent statistical validation of Cox-proportional hazard assumptions before being included in the meta-analysis.

Meta-analysis
We performed inverse variance-weighted fixed-effects meta-analysis. Due to the variability of available CpG sites across cohorts after quality-control steps, we included only CpG sites that were available in three or more cohorts. We accounted for multiple testing by controlling at 5% both the Bonferroni correction and false discovery rate (FDR) using the Benjamini-Hochberg procedure.
For FDR-significant CpGs, we confirmed robustness of the models and results in additional analyses using the leave-one-out cohort validation method, by excluding one cohort at a time and then comparing model estimates for each CpG. We compared effect hazard ratio (HR) and 95% confidence interval (95% CI) for the model to estimates for our models to evaluate the consistency of our findings. For each CpG, we evaluated goodness of the meta-analysis model using the I² statistic measure of inter-study variability from random-effect meta-analyses.

Enrichment analysis
We enriched our results using a publicly available catalog of all published GWAS relating genetic variants with human diseases (National Human Genome Research Institute-EBI GWAS Catalog) to elucidate potential associations [14]. Enrichment analysis was performed in R using one-sided Fisher exact test. We controlled for false positives with the FDR procedure.
We evaluated whether CpG sites associated with mortality were enriched with genomic features provided in the Illumina annotation file (version 1.2; http://support.illumina.com/array/array_kits/infinium_hu manmethylation450_beadchip_kit/downloads.html) to identify CpG location relative to the gene (i.e., body, first exon, 3'-UTR, 5'-UTR, within 200 bp of transcriptional start site [TSS200]), and within 1500 bp of transcriptional start site [TSS1500]) and relation of the CpG site to a CpG island, northern shelf, northern shore, southern shelf, and southern shore.
We also tested each gene mapped to the newly identified CpGs for tissue-specific expression using data from the Genotype Tissue Expression (GTEx) project as integrated by the Functional Mapping and Annotation (FUMA) tool [29], which allowed us to extract and interpret relevant biological information from publicly available repositories and provide interactive figures for prioritized genes. As a result, we obtained a heatmap of genes with normalized gene expression values (reads per kilo base per million). To obtain differentially expressed gene sets for each of 53 tissue types in the database, we used two-sided Student's t-tests on normalized expression per gene per tissue against all other tissues. We controlled for multiple comparison with Bonferroni correction. Finally, we distinguished between genes upregulated and downregulated in a specific tissue compared to other tissues by accounting for sign of the t-score [29].

Pathway analyses
To functionally interpret the genomic information identified from FDR-significant CpGs, we used the Kyoto Encyclopedia of Genes and Genomes (KEGG) AGING pathway database, which links genomic information with higher-order functional information. Genomic information stored in the GENES database is a collection of gene catalogs for all completely sequenced genomes and some partial genomes with up-to-date annotation of gene functions. Higher-order functional information stored in the PATHWAY database contains graphical representations of cellular processes, such as metabolism, membrane transport, signal transduction, and cell cycle [30]. We controlled our results for multiple comparisons with the FDR approach. We finally confirmed our results with the Database for Annotation, Visualization and Integrated Discovery (DAVID). We tested for enrichment in gene ontology biological processes and applied the Benjamini-Hochberg procedure to control for false positivity. We mapped each CpG significantly associated with mortality to genes on the basis of the 450K BeadChip annotation file. We excluded CpGs lacking annotated genes within 10 Mb (n = 3). Using topGO in R, we tested for gene enrichment over the background array (16, 119 unique annotated Entrez Gene IDs) by using Fisher's exact tests with a minimum of two genes per node.

Integrating DNA methylation with quantitative trait loci analysis (meQTL)
A subset of 713 KORA samples was genotyped on an Affymetrix Axiom array. We removed variants with a call rate of <0.98, Hardy-Weinberg equilibrium P < 5x10 -6 , and minor allele frequency < 0.01. We considered only variants with an information score > 3. Imputation was performed using the 1000 Genomes Project phase I version 3 reference panel with IMPUTE 2.3.0. Phasing of data was performed using SHAPEIT v2. We retained approximately 10,000,000 variants for analyses. In each model, we used DNA methylation beta values as independent variables and SNPs as dependent variables. We adjusted each model for age, sex, body mass index, and white blood cell proportions. We used OmicABEL [31] for the analyses and genotype probabilities for each variant. Due to large size of the output, we retained only variants with P < 1x10 -4 . We considered genome-wide significant results at P < 1x10 -14 . We reported only associations with CpGs significant in the epigenome-wide association study.

Integrating DNA methylation with gene expression (eQTM)
In KORA, 998 individuals had both valid methylation and blood gene expression data, which we used to assess whether DNA methylation was correlated with gene expression. Gene expression data (Illumina HumanHT-12 v3 Expression BeadChip) was quality controlled with GenomeStudio, and samples with <6,000 detected genes were excluded from analysis. All samples were log2-transformed and quantile-normalized using the Bioconductor package lumi [32]. A total of 48,803 expression probes passed quality control. We used R (version 3.3.1) to run a linear mixed effects model adjusting for covariates (age, sex, blood cell proportions, and technical variables of RNA integrity number, sample storage time, and RNA amplification batch) and a random intercept for RNA amplification batch. Models were run for each of the nine newlyidentified CpGs associated with mortality. We filtered results to report only CpG-expression probe pairs located on the same chromosome. Start and end sites for each gene were determined according to the Illumina HT annotation file. A cutoff of 500,000 bp was used to differentiate cis-vs. trans-eQTMs.

Miettinen's population attributable factor and mendelian randomization analysis
To assess the contribution of methylation levels of each CpG to all-cause mortality, we calculated Miettinen's population attributable fraction on data from the inhouse Normative Aging Study (NAS) and Women Health Initiative-Epigenetic Mechanisms of Particulate Matter-Mediated Cardiovascular Disease (WHI-EMPC) for European and African American ancestries. Population attributable fraction takes into account strength of association between the risk factor (DNA methylation higher than the mean in specific CpG sites) and outcome (mortality) as well as prevalence of the risk factor in the population [33]. This metric provides estimates of the public health importance of risk factors, ascertaining what proportion of the outcome is due to exposure to the risk factor, and distinguishes between etiologic fraction attributable to or related to the given risk factor depending on whether all or just some confounding by extraneous factors was under control [33]. To support information about the population attributable factor, we also included two Mendelian randomization approaches.
We identified the causal effect on all-cause mortality of FDR-significant CpGs by using two sample Mendelian randomization analyses and summary statistics from published GWAS for chronic diseases and longevity [34] and chronic diseases, including CHD [35] , as these three methods use different assumptions to provide consistent causal effect estimates even with invalid instruments arising from horizontal pleiotropy, a primary source of bias in multi-variant Mendelian randomization analyses.

FDR-significant CpGs, DNA methylation-related aging measures, and mortality risk score
PhenoAge, a composite measure of CpG sites representing phenotypic age, captures differences between lifespan and health span. The Horvath clock is a linear combination of sites identifying the cumulative effect of an epigenetic maintenance system [1,45]. Among the 513 CpGs comprising PhenoAge, 41 are shared with the Horvath clock. While both aging measures correlate strongly with age in every tissue and cell type tested, and both captured risks for mortality across multiple tissues and cells, PhenoAge is highly predictive of nearly every morbidity [1,10]. Blood PhenoAge outperformed the Horvath clock with regard to predictions for a variety of aging outcomes, including all-cause mortality. The mortality risk score instead was based on results using discovery cohort ESTHER (61 years old on average) and both ESTHER and KORA for validation [11].
To investigate whether the association of FDRsignificant CpGs with mortality was independent of DNA methylation aging measures and risk score, we included acceleration of PhenoAge and Horvath clock, defined respectively as discrepancies between age with PhenoAge and Horvath clock age and the risk score. We also identified the correlation between each CpG included in the risk score and our FDR-significant CpGs, and we compared our pooled meta-analysis results with previous findings.

Cell-type fractions and all-cause mortality
Cell-type fractions, mostly NLR, influence DNA methylation levels and have been associated with comorbidities and mortality [20][21][22]. To elucidate which cell proportions were associated with mortality when adjusting for DNA methylation at FDR-significant CpGs, we included NLR, which has been associated with lung cancer risk and mortality [21] as well was cardiovascular disease and mortality in prospective studies [22]. NLR computation was performed using DNA methylation data via Koestler et al. [46]

CONFLICTS OF INTEREST
The authors declare that they have no conflicts of interest.

FUNDING
ARIC has been funded in whole or in part with federal funds from the U.S. National Heart, Lung, and Blood Institute, National Institutes of Health (NIH), Department of Health and Human Services (HHSN 268201700001I, HHSN268201700002I, HHSN2682017 00003I, HHSN268201700004I, HSN268201700005I). The authors thank the staff and participants of the ARIC study for their important contributions. Funding was also supported by 5RC2HL102419 and R01NS087541.   [1990][1991][1992] or Visit 3 (1993Visit 3 ( -1995, when the DNA used for methylation quantification was collected. Covariates were measured at the time of blood draw, unless otherwise specified. Data on education, smoking status, smoking pack-years, alcohol intake, and physical activity were obtained by self-report at Visit 1. Trained technicians took fasting blood samples and measured height and weight using standard protocols. Diabetes was defined as a fasting blood glucose level of ≥126 mg/dL, non-fasting blood glucose level of ≥200 mg/dL, self-reported physician diagnosis of diabetes, or use of antidiabetic medication in the past 2 weeks. Hypertension was defined as systolic blood pressure ≥140 mm Hg, diastolic blood pressure ≥90 mm Hg, or self-reported use of antihypertensive medication in the past 2 weeks. History of cancer was defined by self-report or incident cancer cases found between Visit 1 and time of blood draw found through cancer registry and hospital linkage. History of coronary heart disease (CHD) was defined as self-reported history at baseline or an adjudicated event (Myocardial infarction (MI), silent MI, coronary artery bypass surgery, or angioplasty) found between Visit 1 and time of blood draw.

ARIC death ascertainment
Deaths among cohort participants were identified through December 2012 via annual telephone calls and by surveillance of local death certificates and obituaries. If a participant was lost to telephone follow-up, a National Death Index search was conducted.

ARIC DNA methylation quantification
Genomic DNA was extracted from peripheral blood leukocyte samples using the Gentra Puregene Blood Kit San Diego, CA, USA). Background subtraction was conducted with the GenomeStudio software using builtin negative control bead types on the array. Positive and negative controls and sample replicates were included on each 96-well plate assayed. After exclusion of controls, replicates, and samples with integrity issues or failed bisulfite conversion, a total of 2,841 study participants had HM450K data available for further quality control (QC) analyses. We removed poor-quality samples with pass rate of <99% (i.e., if the sample had at least 1% of CpG sites with detection P-value > 0.01 or missing), indicative of lower DNA quality or incomplete bisulfite conversion, and samples with a possible gender mismatch based on evaluation of selected CpG sites on the Y chromosome. Additional details have been published elsewhere [2 , 3].

FHS study participants
The FHS Offspring Cohort began enrollment in 1971 and included 5,124 offspring of the FHS original cohort as well as spouses of the offspring. Participants were eligible for the current study if they attended the eighth examination cycle (2005)(2006)(2007)(2008) and consented to have their DNA used for genetic research. All participants provided written informed consent at the time of each examination visit. The study protocol was approved by the Institutional Review Board at Boston University Medical Center (Boston, MA). FHS data are available in dbGaP (accession number: phs000724.v2.p9).

FHS death ascertainment
Deaths among FHS participants that occurred before January 1, 2013 were ascertained using multiple strategies, including routine contact with participants for health history updates, surveillance at the local AGING hospital, obituaries in the local newspaper, and queries to the National Death Index. Death certificates, hospital and nursing home records before death, and autopsy reports were requested. When cause of death was undeterminable, the next of kin were interviewed. The date and cause of death were reviewed by an endpoint panel of three investigators.

FHS DNA methylation quantification
Peripheral blood samples were collected at the 8 th examination. Genomic DNA was extracted from buffy coats using the Gentra Puregene DNA extraction kit (Qiagen) and bisulfite converted using the EZ DNA Methylation kit (Zymo Research). DNA methylation quantification was conducted in two laboratory batches using the Illumina Infinium HumanMethylation450 array. Methylation beta values were generated using the Bioconductor minfi package with background correction. Sample exclusion criteria included poor SNP matching of control positions, missing rate >1%, outliers from multi-dimensional scaling, and sex mismatch. In addition, we excluded individuals with leukemia and those who received chemotherapy. Additional sample exclusions included those with mismatches in their reported sex and methylationpredicted sex as well as methylation-predicted tissues that were not blood. Lastly, samples with correlation with our reference population of r < 0.80 were excluded. Predicted sex, tissues, correlation with reference population, and DNA methylationpredicted ages were computed using our online age calculator (http://labs.genetics.ucla.edu/horvath/dnamage). Background subtraction was applied using the preprocessIllumina command in the minfi Bioconductor package [4]. In total, 2,635 samples and 443,304 CpG probes remained for analysis.

InChianti study participants
The InCHIANTI Study is a population-based prospective cohort study of residents ≥20 years old from two areas in the Chianti region of Tuscany, Italy. Sampling and data collection procedures have been described elsewhere [5]. Briefly, 1,326 participants donated a blood sample at baseline (1998)(1999)(2000), of which 784 also donated a blood sample at 9-year follow-up (2007-2009). DNA methylation was assayed using the Illumina Infinium HumanMethylation450 platform in DNA samples corresponding to participants with sufficient DNA at both baseline and Year 9 visits (n = 499). All participants provided written informed consent to participate in this study. The study complied with the Declaration of Helsinki. The Italian National Institute of Research and Care on Aging Institutional Review Board approved the study protocol.

InChianti death ascertainment
Vital status was ascertained using data from the Tuscany Regional Mortality General Registry. Deaths were assessed until December 1, 2014.

InChianti DNA methylation quantification
Genomic DNA was extracted from buffy coat samples using an AutoGen Flex and quantified on a Nanodrop1000 spectrophotometer before bisulfite conversion. Genomic DNA was bisulfite converted using the Zymo EZ-96 DNA Methylation Kit (Zymo Research) per the manufacturer's protocol. CpG methylation status of 485, 577 CpG sites was determined using the Illumina Infinium HumanMethylation450 BeadChip per the manufacturer's protocol, as previously described [6]. Initial data analysis was performed using GenomeStudio 2011.1 (Model M Version 1.9.0, Illumina Inc.). Threshold call rate for inclusion of samples was 95%. Quality control of sample handling included comparison of clinically reported sex versus sex of the same samples determined by analysis of methylation levels of CpG sites on the X chromosome [6]. Background subtraction was applied using the preprocessIllumina command in the minfi Bioconductor package [4].

KORA cohort description
The KORA study is an independent population-based cohort from Augsburg, Southern Germany. Whole blood samples of the KORA F4 survey (examination 2006-2008), a seven-year follow-up study of the KORA S4 cohort, were used. Out of 4,621 participants for the KORA S4 baseline study, 3,080 participants participated in the KORA F4 follow-up study [7]. Participants provided written informed consent, and the study was approved by the local ethics committee (Bayerische Landesärztekammer). For 1,799 subjects, methylation data as well as information about death ascertainment was available. Before analyses, all individuals with a detection P-value > 0.05 for >1% of probes were removed (375 individuals). Sex checks performed during calculation of DNAmAge resulted in the removal of another 167 individuals, 137 of whom had an "unsure" gender. This left 1,257 individuals for analysis. At the KORA F4 follow-up examination, all individuals completed questionnaires and physical examinations conducted by trained staff covering demographics, lifestyle, and medical history since the KORA S4 examination. Collected information included age, sex, years of education, smoking status (current regular, current irregular, former, never), pack-years, alcohol consumption AGING (g/day), physical activity (active, inactive), diabetes status, hypertension status, self-reported cancer diagnosis, and body mass index (BMI), among other clinical variables [7].

KORA mortality ascertainment
The vital status of all F4 participants was ascertained through the population registries inside and outside the study area in 2011 (cut-off date: December 31, 2011). Record linkage was based on name, sex, date of birth, and address. If the person died, the time and location of death was assessed via population registries, and a copy of the death certificate was obtained from the Regional Health Department. If the person moved out of the study area, time of the move and information on the new address was typically available. Vital status could not be assessed for those who had moved to a foreign country or to an unknown location in the country. Causes of death were ICD-9 revision coded. There were a total of 42 deaths, including 16 from cardiovascular disease and 17 from cancer.

KORA DNA methylation measures
Whole blood was drawn into serum gel tubes. We bisulfite-converted 1 µg of genomic DNA using the EZ-96 DNA Methylation Kit (Zymo Research) according to the manufacturer's procedure, with the alternative incubation conditions recommended when using the Illumina Infinium Methylation Assay. Genome-wide DNA methylation was analyzed in 1,799 subjects using the Illumina Infinium HumanMethylation450 BeadChip Array. Raw methylation data were extracted using the Illumina Genome Studio (version 2011.1) with the methylation module (version 1.9.0). Preprocessing was performed with R (version 3.0.1). Probes with signals from less than three functional beads and probes with a detection P-value > 0.01 were defined as lowconfidence probes. Probes that covered SNPs (MAF in Europeans > 5%) were excluded from the data set. A color bias adjustment was performed with the R package lumi (version 2.12.0) by smooth quantile normalization and background correction based on negative control probes present on the Infinium HumanMethylation BeadChip. This was performed separately for the two-color channels and chips. βvalues corresponding to low-confidence probes were set to missing. A 95% call rate threshold was applied on samples and CpG sites. Beta-mixture quantile normalization (BMIQ) was applied by using the R package wateRmelon, version 1.0.3. Plate and batch effects were investigated by principle component analysis and eigenR2 analysis, because KORA F4 samples were processed on 20 96-well plates across nine different batches.
Probes with a detection P > 0.05 for > 1% of samples were removed as well as all "ch" and "rs" probes, leaving a total of 431, 217 probes for analysis. Although raw beta values were used in Dr. Horvath's online calculator to determine cell counts, normalized data was used for the final analyses.
To reduce non-biological variability between observations, data were normalized using quantile normalization on raw signal intensities. Precisely, quantile normalization was stratified to six probe categories based on probe type and color channel (i.e., Infinium I signals from beads targeting methylated CpG sites obtained through red and green color channels, Infinium I signals from beads targeting unmethylated CpG sites obtained through red and green color channels, and Infinium II signals obtained through red and green color channels [8]) using the R package limma, version 3.16.5 [9]. Further, to correct the shift in the distribution of methylation values observed for the two different assay designs (Infinium I and Infinium II) on the BeadChip, BMIQ was applied [10] using the R package wateRmelon, version 1.0.3 [11].

Lothian birth cohorts of 1921 and 1936 (LBC1921 and LBC1936)
LBC cohort description LBC1921 and LBC1936 are two longitudinal studies of aging [12,13] that derive from the Scottish Mental Surveys of 1932 and 1947, respectively, when nearly all 11-year-old children in Scotland completed a test of general cognitive ability [14]. Survivors living in the Lothian area of Scotland were recruited in late-life at a mean age of 79 years for LBC1921 (n = 550) and mean age of 70 years for LBC1936 (n = 1,091). Follow-up took place at ages 70, 73, and 76 years in LBC1936 and ages 79, 83, 87, and 90 years in LBC1921. Collected data include genetic information, longitudinal epigenetic information, longitudinal brain imaging (LBC1936), numerous blood biomarkers, and anthropomorphic and lifestyle measures. Post-QC, DNA methylation data were available for 920 LBC1936 participants at age 70 years and for 446 LBC1921 participants at age 79 years. At each in-person visit, participants completed questionnaires regarding demography, lifestyle, and medical history. They reported chronological age, years of education, smoking status (never, former, current), pack-years consumption (continuous), alcohol consumption (light, moderate, and heavy drinkers), self-reported type 2 diabetes, cancer, and hypertension. BMI was computed from anthropometric measures. Participants were asked to remove their shoes before a SECA stadiometer was used to assess height in centimeters. Weight (after removing shoes and outer clothing) was measured in kilograms using a digital readout from electronic SECA scales.

LBC mortality ascertainment
For both LBC1921 and LBC1936, mortality status was obtained via data linkage from the National Health Service Central Register, provided by the General Register Office for Scotland (now National Records of Scotland). Participant deaths and cause of death are routinely flagged to the research team about every 12 weeks. The last update available for the current project was 26th November 2014.

LBC DNA methylation measures
Detailed information about collection and QC steps on LBC methylation data have been reported previously [12,15]. Briefly, the Illumina Infinium HumanMethylation450 BeadChip was used to measure DNA methylation in whole blood of consenting participants. Background correction was performed, and QC was used to remove probes with a low detection rate, low quality (manual inspection), and low call rate as well as samples with a poor match between genotypes and SNP control probes or incorrect predicted sex. Additional QC was performed to remove samples and probes in which >1% of probes or samples, respectively, had a detection P > 0.05. The working set included 442, 227 CpG probes.

NAS cohort description
The ongoing longitudinal US Department of Veterans Affairs NAS was established in 1963 and included men 21-80 years old and free of known chronic medical conditions at entry [16]. Participants were invited to medical examinations every three to five years. At each visit, men provided information on medical history, lifestyle, and demographic factors and underwent physical examinations and laboratory tests. DNA samples were collected from 675 active participants between 1999-2007 [16]. We excluded participants who were non-white or who reported leukemia at the time of DNA extraction, leaving a total of 646 individuals with a single observation each. Participants provided written informed consent at each visit. The NAS study was approved by the institutional review boards of participating institutions. At each in-person visit, participants completed questionnaires regarding demography, lifestyle, and medical history. They reported chronological age, years of education, smoking status (never, former, current), pack-years consumption (continuous), alcohol consumption (<2, ≥2 drinks/day), physical activity (<12, 12-30, ≥30 metabolic equivalent hours [MET-h] per week), type 2 diabetes (self-reported diagnosis and/or use of diabetes medications), diagnosis of CHD (validated on medical records, ECG, and physician exams), diagnosis of malignant cancer in the five years prior the visit (diagnosed with ICD-9 code). High blood pressure was defined as antihypertensive medication use, systolic blood pressure ≥140 mmHg, or diastolic blood pressure ≥90 mmHg at study visit. BMI was computed from anthropometric measures, performed with participants in undershorts and socks [17].

NAS mortality ascertainment
Official death certificates were obtained for decedents from the appropriate state health departments and were reviewed by a physician. An experienced research nurse coded the cause of death using ICD-9. Both participant deaths and causes of death were routinely updated by the research team, and the last update available was December 31, 2013 [12].

NAS DNA methylation measures
DNA was extracted from buffy coats using the QIAamp DNA Blood Kit (Qiagen). We used 500 ng of DNA for bisulfite conversion using the EZ-96 DNA Methylation Kit (Zymo Research). To reduce chip and plate effects, we used a two-stage age-stratified algorithm to randomize samples and ensure similar age distributions across chips and plates; 12 samples that were sampled across all age quartiles were randomized to each chip, and then chips were randomized to plates (8 chips/plate).
QC analysis was performed to remove samples and probes, where >1% of probes or samples, respectively, had a detection P > 0.05. Remaining samples were preprocessed using the Illumina-type background correction [18] and normalized with dye-bias [19] and BMIQ [20] adjustments, which were used to generate beta methylation values. The working set included 477, 928 CpG probes. DNA methylation age was computed using the Horvath calculator from background-corrected methylation data, and QC analysis was performed only on samples, leaving 485, 512 CpG and CpH probes in the working set.

TwinsUK study participants
The TwinsUK cohort was established in 1992 and recruited both monozygotic and dizygotic same-sex twins in the United Kingdom. The majority of participants are female, Caucasian, and mostly diseasefree at time of ascertainment. There are >13, 000 twin participants in the cohort, of which 805 were included in the current study. Whole blood samples were collected during participants' clinical visits, along with questionnaire data on phenotype and lifestyle factors. All subjects provided written informed consent [21].

AGING
Information on physical activity, smoking pack-years, plate number, and chip position number were not available for subjects in the TwinsUK dataset and therefore were not adjusted as covariates in all analyses.

TwinsUK death ascertainment
Mortality data were collected using two approaches: 1) during routine contact for standard clinical visits in TwinsUK, and 2) using queries to the National Death Register. Date and cause of death were recorded.
TwinsUK DNA methylation quantification DNA samples were extracted from whole blood using the DNeasy kit (Qiagen). DNA was bisulfite converted using the EZ DNA methylation kit (Zymo Research). Methylation levels were profiled using the Illumina Infinium HumanMethylation450 array, and methylation betas were generated using the R package minfi with background correction. Raw beta levels were subjected to BMIQ dilation to correct for technical effects. Probe exclusion criteria included probes that mapped to multiple locations in the reference sequence and probes in which >1% of subjects had detection P > 0.05. Individuals with >5% missing probes, with mismatched sex, and with mismatched genotypes were also excluded. Methylation-predicted sex, methylation-predicted blood cell types, correlations with the reference population, and DNA methylation-predicted age were computed using the online epigenetic age calculator (http://labs.genetics.ucla.edu/horvath/dnamage).

WHI-BAA23 cohort description
Subjects included a subsample of participants of the WHI study, a national study that began in 1993 and enrolled postmenopausal women 50-79 years of age into one of three randomized clinical trials. Women were selected from one of two WHI large sub-cohorts that had previously undergone genome-wide genotyping as well as profiling for 7 cardiovascular disease-related biomarkers, including total cholesterol, high-density lipoprotein, low-density lipoprotein, triglycerides, Creactive protein (CRP), creatinine, insulin, and glucose through two core WHI ancillary studies [22]. The first cohort is the WHI SNP Health Association Resource (SHARe) cohort of minorities that includes >8,000 African American (AA) women and >3,500 Hispanic women. Women were genotyped through the WHI core study M5-SHARe (www.whi.org/researchers/data/ WHIStudies/StudySites/M5) and underwent biomarker profiling through WHI Core study W54-SHARe (...data/WHIStudies/StudySites/W54). The second cohort consists of a combination of European Americans (EA) from two hormonal therapy trials selected for GWAS and biomarkers in core studies W58 (.../ data /WHIStudies/StudySites/W58) and W63 (.../data/ WHIStudies/StudySites/W63). From these two cohorts, two sample sets were formed. Sample Set 1 is a sample set of 637 CHD cases and 631 non-CHD cases as of Sept 30, 2010. Sample Set 2 is a non-overlapping sample of 432 cases of CHD and 472 non-CHD cases as of September 17, 2012. The ethnic groups differed in terms of age distribution, as Caucasian women tended to be older. We acknowledge a potential for selection bias using the above-described sampling scheme in WHI but suspect that if such bias is present, it is minimal. First, selection bias is introduced by restricting our methylation profiling at baseline to women with GWAS and biomarker data from baseline as well, given the requirement that these subjects must have signed the WHI supplemental consent for broad sharing of genetic data in 2005. However, we believe that selection bias at this stage is minimized by inclusion of subjects who died between time of start of the WHI study and time of supplemental consent in 2005, which excluded only ~6%-8% of all WHI participants. Subjects unable or unwilling to sign consent in 2005 may not represent a random subset of all participants who survived to 2005. Second, some selection bias may also occur if similar gross differences exist in the characteristics of participants who consented to be followed in the two WHI extension studies beginning in 2005 and 2010 compared to non-participants at each stage. We believe these selection biases, if present, have minimal effects on our effect estimates. Data are available from this page: https://www.whi.org/researchers/Stories/June%202015 %20WHI%20Investigators'%20Datasets%20Released. aspx, as well as https://www.whi.org/researchers/data/ Documents/WHI%20Data%20Preparation%20and%20 Use.pdf

WHI-BAA23 death ascertainment
We used the variable "DEATHALL" from form 124/120 that incorporated any report of death (as of August 2015).

WHI-BAA23 DNA methylation quantification
In brief, bisulfite conversion using the Zymo EZ DNA Methylation Kit (Zymo Research) as well as subsequent hybridization of the Illumina HumanMethylation450k Bead Chip and scanning (iScan, Illumina) were performed according to the manufacturer's protocols by applying standard settings. DNA methylation levels (β values) were determined by calculating the ratio of intensities between methylated (signal A) and unmethylated (signal B) sites. Specifically, β value was calculated from the intensity of methylated (M corresponding to signal A) and un-methylated (U corresponding to signal B) sites, as the ratio of fluorescent signals β = Max(M,0)/[Max(M,0)+Max AGING (U,0)+100]. Thus, β values range from 0 (completely un-methylated) to 1 (completely methylated).

WHI-EMPC cohort description
WHI-EMPC is an ancillary study of epigenetic mechanisms underlying associations between ambient particulate matter (PM) air pollution and cardiovascular disease in the WHI clinical trials (CT) cohort. It is funded by the National Institute of Environmental Health Sciences (R01-ES020836).
The WHI-EMPC study population is a stratified, random sample of 2,200 WHI CT participants who were examined in 1993-2001; had available buffy coats, core analytes, electrocardiograms, and ambient concentrations of PM; and were not taking anti-arrhythmic medications at the time.
As such, WHI-EMPC is representative of the larger, multiethnic WHI CT population from which it was sampled: 68 , 132 participants aged 50-79 years who were randomized to hormone therapy, calcium/vitamin D supplementation, and/or dietary modification in 40 U.S. clinical centers at baseline exam (1993-1998) and re-examined in the fasting state one, three, six, and nine years later [23,24]. During participant visits, data on age, race/ethnicity, education, smoking status (current, former, never), pack-years of smoking, alcohol consumption (drinks per week), recreational physical activity (MET-hours/week), weight/height/BMI, systolic and diastolic blood pressure, medication use, CHD, type 2 diabetes, and cancer diagnosis were obtained.
Hypertension status was based on systolic blood pressure ≥140 mmHg or diastolic blood pressure ≥90 mmHg or antihypertensive medication use (angiotensin converting enzyme inhibitors, angiotensin II receptor antagonists, beta blockers, calcium channel blockers, thiazides). CHD was defined by a history of myocardial infarction (acute, hospitalized, definite or probable events supported by cardiac pain, electrocardiogram, and biomarker data) or revascularization procedure (coronary artery bypass graft, percutaneous coronary angioplasty, stent) and was self-reported at baseline and confirmed by physician-review, classification, and local/central adjudication of medical records during follow-up. Type 2 diabetes was defined by a self-reported history of physician-treated diabetes, fasting glucose ≥126 mg/dL, non-fasting glucose ≥200 mg/dL, or anti-diabetic medication use. Cancer was defined by a diagnosis of any cancer, excluding leukemia and other hematologic malignancies (Hodgkin's lymphoma, non-Hodgkin's lymphoma, multiple myeloma). Current analyses involve information collected at the first available visit with available DNA methylation data and stratification by race/ethnicity [European (WHI-EMPC-EA) and African American (WHI-EMPC-AA) ancestries].

WHI-EMPC mortality ascertainment
All-cause mortality and sub-classification of the underlying cause of death to cardiovascular or cancer mortality were based on WHI physician review of death certificates, medical records, and autopsy reports. Cardiovascular disease mortality was defined as death due to definite or possible CHD, cerebrovascular disease, or other or unknown cardiovascular disease. Cancer mortality was defined as death due to any cancer. Participants affected by leukemia or other hematologic malignancies (i.e., Hodgkin's lymphoma, non-Hodgkin's lymphoma, multiple myeloma) were excluded due to known effects on red cell, white cell, and platelet counts.

WHI-EMPC DNA methylation quantification
Genome-wide DNA methylation at CpG sites was measured using the Illumina 450K Infinium Methylation BeadChip, quantitatively represented by beta (percentage of methylated cytosines over the sum of methylated and unmethylated cytosines) and qualitycontrolled using the following filters: detection P > 0.01 in >10% of samples, detection P > 0.01 or missing in >1% of probes, and probes with a coefficient of variation <5%, yielding values of beta at 293, 171 sites. DNA methylation data were normalized using BMIQ [25] and stage-adjusted using ComBat [10]. Modeled epigenome-wide associations also adjusted for cell subtype proportions (CD8-T, CD4-T, B cell, natural killer, monocyte, and granulocyte) [26] and for technical covariates, including plate, chip, row, and column.    Odds ratio (OR), lower 95% confidence interval (LCI), and upper 95% confidence interval (UCI) given per 10% higher methylation. Associations for coronary heart disease taken from (PMC4589895) and association for serum creatinine taken from (PMC4735748). Associations for ARIES methQTLs extracted from MR-base (PMC5976434) using middle age estimates for methQTLs from ARIES cohort (PMC4818469). MR = mendelian randomization; N SNPs = number of SNPs (instruments) used for the MR analyses; P = p-value.