Epigenome‐wide analyses identify DNA methylation signatures of dementia risk

Abstract Introduction Dementia pathogenesis begins years before clinical symptom onset, necessitating the understanding of premorbid risk mechanisms. Here we investigated potential pathogenic mechanisms by assessing DNA methylation associations with dementia risk factors in Alzheimer's disease (AD)–free participants. Methods Associations between dementia risk measures (family history, AD genetic risk score [GRS], and dementia risk scores [combining lifestyle, demographic, and genetic factors]) and whole‐blood DNA methylation were assessed in discovery and replication samples (n = ~400 to ~5000) from Generation Scotland. Results AD genetic risk and two dementia risk scores were associated with differential methylation. The GRS associated predominantly with methylation differences in cis but also identified a genomic region implicated in Parkinson disease. Loci associated with dementia risk scores were enriched for those previously associated with body mass index and alcohol consumption. Discussion Dementia risk measures show widespread association with blood‐based methylation, generating several hypotheses for assessment by future studies.


BACKGROUND
The pathophysiology of dementia begins many years, possibly decades, before the emergence of clinical symptoms. 1 This long prodromal phase highlights the need for preventative strategies prior to the development of irreversible brain damage. As such, understanding premorbid risk mechanisms is critical. Several approaches to identify individuals at risk of developing dementia have been devised, including the summation of genetic risk, in the form of genetic risk scores (GRSs), the consideration of family history, and the calculation of risk scores, which incorporate multiple lifestyle, demographic, and genetic risk factors. [2][3][4] DNA methylation is an epigenetic modification which, in some contexts, is associated with gene expression variation. Altered gene expression has been identified in the blood and post-mortem brains of AD patients, 5,6 and post-mortem brain-based studies have identified associations between DNA methylation and AD and its neuropathological hallmarks. [7][8][9] Blood-based studies, while limited by small sample sizes, have also found evidence for AD-associated methylation differences. 10,11 It is not, however, possible to determine from these studies whether methylation differences precede AD onset, making them potentially etiologically informative, or whether they reflect ongoing pathology, compensatory mechanisms and/or treatment effects. Studies that have identified associations between variation in blood-based DNA methylation and risk factors for dementia (eg, carrying the apolipoprotein E (APOE) ε4 haplotype, 12,13 aging, 14 and obesity 15 ) suggest that the assessment of methylation in this tissue may yield insights into the pathways and processes that lead to dementia.
In this study, by assessing associations between multiple measures of dementia risk and blood-based DNA methylation in AD-free participants, we aim to further understand the mechanisms conferring dementia risk and characterize the role of methylation in these processes.

Participants
Participants were drawn from the Generation Scotland: Scottish Family Health Study (GS:SFHS). 16,17 The cohort comprises ≈24,000

Calculation of dementia risk scores
Four dementia risk scores, henceforth referred to as CAIDE1, CAIDE2, 2 Li, 3 and Reitz, 4 were calculated using data that were collected at GS:SFHS enrollment or obtained through record linkage (see Figure 1, Supplementary Methods and Table S1 for information on the contributing variables). To generate each risk score, the contributing variables were scaled and weighted according to the original studies, and summed. The Reitz score 4 was calculated using weightings devised when considering participants with both a "possible" and "probable" diagnosis of Alzheimer's disease (AD). Each score was calculated for participants within the appropriate age-range (CAIDE1/2: 39-64 years 2 ; Li: ≥60 years 3 ; Reitz: ≥65 years). 4

2.3
Genotyping and calculation of Alzheimer's disease genetic risk score GS:SFHS genotyping has been described previously 18,19 (Supplementary Methods). The AD GRS was calculated using the lead single-nucleotide polymorphism (SNP) from each of the 26 genome-wide significant loci identified through a meta-analysis of parental AD and AD 20 (Table S2)

Epigenome-wide association studies
Epigenome-wide association studies (EWASs) were performed using linear regression modeling, implemented in limma. 25  A number of sensitivity analyses for the CAIDE1 score were performed in which additional covariates were included one-by-one, using the same thresholds for categorizing continuous variables as implemented in the risk score. These were body mass index (BMI; ≤30 kg/m 2 or >30 kg/m 2 ); systolic blood pressure (SBP; ≤140 mm Hg or >140 mm Hg); total cholesterol (TC; ≤6.5 mmol/L or > 6.5 mmol/L); years of education (≥10, >6 and <10, or ≤6); self-reported alcohol consumption (log 10 -transformed (+1) units of alcohol/week), and a DNA methylation alcohol consumption score derived using the R package dnamalci. 26,27 Limma was used to calculate empirical Bayes moderated t-statistics from which P values were obtained. The significance threshold in the discovery sample was P ≤ 3.6 × 10 −8 . 28 Sites attaining significance in the discovery sample were assessed in the replication sample using a Bonferroni-adjusted threshold of 0.05/no. sites assessed.

EWAS meta-analysis
Inverse standard error-weighted fixed effects meta-analyses of the discovery and replication EWAS results were performed using METAL. 29 Sites attaining a meta-analysis P ≤ 3.6 × 10 −8 were considered significant.

Identification of differentially methylated regions
Differentially methylated regions (DMRs) were identified using the dmrff.meta function in dmrff. 30 DMRs were defined as regions containing 2 to 30 sites with consistent direction of effect and EWAS meta-analysis P values ≤.05 separated by ≤500 bp. DMRs with Bonferroni-adjusted P values ≤.05 were declared significant.

EWAS and GWAS catalog look-ups
The

Identification of meQTLs
Methylation quantitative trait loci (meQTLs) for the AD GRSassociated DMPs were identified using the discovery sample. The quality control, normalization, and pre-correction of the data prior to the meQTL analyses have been described previously 32

Epigenome-wide asssociation study sample demographics
Participant numbers and sample demographic information are shown in Table S3.

3.2
Genetic risk for Alzheimer's disease
Querying the GWAS catalog 34 with the 34 gene names annotated to the 68 meta-DMPs unsurprisingly identified many terms related to TA B L E 1 Top 20 DMPs associated with the AD GRS in a meta-analysis of the discovery and replication samples  AD and its neuropathological hallmarks (Table S7), the most significant being "AD or family history of AD" (P = 1.77 × 10 −27

Identification of differentially methylated regions
The differentially methylated region (DMR) meta-analysis identified 18  Gene ontology (GO) analysis using the combined DMP and DMR results identified 18 terms, the most significant of which was "amyloidbeta formation" (P = 3.68 × 10 −10 ; Table 2). No significant KEGG pathways were identified.

Mid-life dementia risk scores
The CAIDE1 and CAIDE2 risk scores assess the risk of developing dementia in 20 years' time in individuals 39 to 64 years of age. 2 CAIDE2 takes into account the same risk factors as CAIDE1 (with different weightings) and also considers apolipoprotein E (APOE) ε4 carrier status.

Sensitivity analyses
The extent to which the BMI component of the CAIDE1 score drives the observed CAIDE1 associations was assessed by performing an EWAS meta-analysis in which BMI was included as an additional covariate. Co-varying for BMI resulted in only 11 of the original 227 meta-DMPs remaining significant (Table S11)

EWAS results
Twenty-four of the 88 meta-DMPs that are represented on the 450K array, including the most significant DMP, cg06690548, have previously been associated with alcohol consumption (P ≤ 1 × 10 −5 ). 26   show a significant association with CAIDE1 with a consistent direction of effect. This overlap is highly significant (P < 2 × 10 −16 ).
Because alcohol consumption showed a small but significant correlation with CAIDE1 score (r = 0.091; 95% CI = 0.065 to 0.118; P = 2.60 × 10 −11 ), the potential for alcohol consumption to drive the observed associations between CAIDE1 and DNA methylation was assessed by including alcohol consumption measured by (1) self-report or (2) a polyepigenetic risk score 26,27

Identification of differentially methylated regions
The DMR meta-analysis of the discovery and replication samples identified 57 CAIDE1-associated DMRs (all Bonferroni-adjusted P < 0.044; Table S16), each comprising two to seven CpGs. In total,

Other measures of dementia risk
The other measures of dementia risk assessed were (1) dementia family history (FH) and (2) two late-life dementia risk scores that predict the risk of developing dementia in those older than 60 or 65 years of age. 3,4 EWASs of the discovery sample (minimum (min).

DISCUSSION
We have assessed DNA methylation associations with a range of dementia risk measures in large discovery and replication samples comprising Alzheimer's disease (AD)-free participants, and we report multiple loci as being associated with AD genetic risk and two multifactorial mid-life risk scores for dementia.
All but one of the loci associated with the AD genetic risk score (GRS) were located within 30 kb of the genome-wide association study (GWAS) loci used to derive the GRS, 20 with methylation quantitative trait loci (meQTL) analysis supporting involvement of cis meQTLs. Only one differentially methylated position (DMP), cg14354618 on chromosome 11, was an exception to this pattern, being associated with trans meQTLs in a GWAS risk locus on chromosome 19. cg14354618 is located in a CpG island in AP001979.1. Genetic variation annotated to AP001979.1 has been associated with Parkinson's disease, 37,38 body fat percentage, 39 and sugar consumption 40 but has not been associated with AD in large-scale GWASs. 41,42 There is a degree of overlap between the clinical features and pathologies associated with AD and Parkinson's disease, with certain genetic variants being associated with both. 43,44 Moreover, obesity and hyperglycemia have been implicated as dementia risk factors. 45 Taken together, these findings suggest this locus to be a plausible AD-risk locus, which warrants further investigation.
Considering both the meta-DMP and differentially methylated region (DMR) results, two regions harbor a large number of AD GRS-associated sites. These regions contain (1) BIN1 and (2) PVRL2, APOE, APOC4, and APOC2 (henceforth referred to as the APOE locus).
The APOE locus has not been identified previously by brain-based epigenome-wide association studies (EWASs) of AD neuropathological hallmarks 7,46 or a blood-based AD case-control EWAS 44 ; larger samples might be required to detect association between methylation at this locus and AD.
In contrast, several studies have identified altered methylation of BIN1 in AD patients or in association with AD neuropathological hallmarks. 7,47,48 These findings are of particular interest, as altered BIN1 brain expression has been reported in AD [49][50][51] and DNA methylation has been suggested to regulate BIN1's expression. 52 We identified a mixture of hyper-and hypomethylation in the upstream region and gene body and hypermethylation in the downstream region. Although none of the identified sites directly replicated those identified by previous studies, it is noteworthy that one of our hypermethylated meta-DMPs (cg18813565) is located only 31 bp from a site (that failed our quality control) at which increased methylation in the dorsolateral prefrontal cortex has been associated with neuritic plaque burden and AD diagnosis. 47  it could not account for the CAIDE1-associated differences in methylation. This finding is of interest in light of the observed associations between excessive alcohol consumption and dementia risk. 53 Our findings suggest that the risk factors contributing to the CAIDE1 score and alcohol consumption might confer risk for dementia via independent effects on common pathways.
The loci implicated by our analyses of the AD GRS and the CAIDE1 score did not overlap, and they did not implicate common genes. In keeping with this, the correlation between the scores was small and non-significant. This lack of concordance might be attributable to differences in the methodology used to create the scores: although the CAIDE1 score was trained using a sample comprising mixed dementia cases (of which ≈75% were diagnosed with AD), 2 the AD GRS was devised using a sample comprising AD and proxy AD cases. 20 Moreover, the CAIDE 1 score predominantly comprises cardiovascular risk factors for dementia, meaning that it is likely to identify a subpopulation of those at risk for dementia.
We did not observe any DNA methylation associations with AD family history (FH) or two late-life dementia risk scores. The lack of associations with AD FH is somewhat surprising, as this has been shown previously to be a good AD proxy-phenotype. 20 Our failure to observe significant associations for these traits may reflect a lack of statistical power, particularly as the samples available for the late-life dementia risk scores were relatively small.
It is important to note some additional strengths and limitations to our study. Although it would clearly be desirable to study DNA methylation in brain tissue, growing evidence highlights the contribution of systemic factors to dementia pathogenesis. 54 Thus methylation studies in the blood are necessary to provide a holistic characterization of the processes that contribute to dementia development. Moreover, profiling blood methylation permits both longitudinal analyses to characterize the dynamic processes underlying dementia pathogenesis and biomarker identification.
An important limitation of our study is that the use of a crosssectional design means that causal inferences cannot be drawn. A corollary of this is that it is difficult to determine whether the methylation differences assessed play a causal role in the development of dementia. In some cases, causal inference analyses to assess relationships with important intermediary variables such as cognitive ability and cognitive decline together with Mendelian randomization may help delineate likely causality; future studies should assess this possibility.
Ultimately, the longitudinal assessment of cognitive decline and the development of dementia will also be necessary to address questions about causation. Moreover, the availability of longitudinal data would also permit the development of an epigenetic (and potentially a multifactorial genetic, epigenetic, and lifestyle factors) predictor of demen- tia. An important conceptual issue that must be considered when attempting to determine causality from longitudinal data is that that the pathogenesis of dementia is itself a gradual process involving quantitative changes in multiple biological systems, which eventually result in the binary diagnosis of dementia. As such, it might not be possible to strictly delineate the temporal relationship between risk factors, their biological correlates and the onset of dementia. Instead, the identification of co-occurring processes might yield experimentally testable hypotheses. An additional factor to consider is that the non-genetic risk factors that contribute to the scores assessed are themselves only associated with the development of dementia and do not necessarily play a causal role. Future studies that aim to delineate the causal contribution of these factors to dementia will facilitate the development of risk scores whose primary purpose is for use in the investigation of pathogenic mechanisms.
Other limitations concern the quality of the variables used in the