Investigating causality in the association between DNA methylation and type 2 diabetes using bidirectional two-sample Mendelian randomisation

Aims/hypothesis Several studies have identified associations between type 2 diabetes and DNA methylation (DNAm). However, the causal role of these associations remains unclear. This study aimed to provide evidence for a causal relationship between DNAm and type 2 diabetes. Methods We used bidirectional two-sample Mendelian randomisation (2SMR) to evaluate causality at 58 CpG sites previously detected in a meta-analysis of epigenome-wide association studies (meta-EWAS) of prevalent type 2 diabetes in European populations. We retrieved genetic proxies for type 2 diabetes and DNAm from the largest genome-wide association study (GWAS) available. We also used data from the Avon Longitudinal Study of Parents and Children (ALSPAC, UK) when associations of interest were not available in the larger datasets. We identified 62 independent SNPs as proxies for type 2 diabetes, and 39 methylation quantitative trait loci as proxies for 30 of the 58 type 2 diabetes-related CpGs. We applied the Bonferroni correction for multiple testing and inferred causality based on p<0.001 for the type 2 diabetes to DNAm direction and p<0.002 for the opposing DNAm to type 2 diabetes direction in the 2SMR analysis. Results We found strong evidence of a causal effect of DNAm at cg25536676 (DHCR24) on type 2 diabetes. An increase in transformed residuals of DNAm at this site was associated with a 43% (OR 1.43, 95% CI 1.15, 1.78, p=0.001) higher risk of type 2 diabetes. We inferred a likely causal direction for the remaining CpG sites assessed. In silico analyses showed that the CpGs analysed were enriched for expression quantitative trait methylation sites (eQTMs) and for specific traits, dependent on the direction of causality predicted by the 2SMR analysis. Conclusions/interpretation We identified one CpG mapping to a gene related to the metabolism of lipids (DHCR24) as a novel causal biomarker for risk of type 2 diabetes. CpGs within the same gene region have previously been associated with type 2 diabetes-related traits in observational studies (BMI, waist circumference, HDL-cholesterol, insulin) and in Mendelian randomisation analyses (LDL-cholesterol). Thus, we hypothesise that our candidate CpG in DHCR24 may be a causal mediator of the association between known modifiable risk factors and type 2 diabetes. Formal causal mediation analysis should be implemented to further validate this assumption. Graphical Abstract Supplementary Information The online version of this article (10.1007/s00125-023-05914-7) contains peer-reviewed but unedited supplementary material.


Introduction
There is growing interest in understanding the role of DNA methylation (DNAm) in the context of type 2 diabetes. In epidemiological studies, associations have been identified between DNAm and both prevalent [1][2][3][4][5][6][7][8] and incident [9][10][11][12][13][14] type 2 diabetes. Associations between DNAm and type 2 diabetes may arise through different mechanisms. For example, changes in DNAm may be causal for type 2 diabetes. Conversely, type 2 diabetes may induce consequent changes in DNAm. It is also possible that the observed association between DNAm and type 2 diabetes arises as a result of confounding from a third factor that is independently associated with both type 2 diabetes and DNAm. Defining the causal relationship between DNAm and type 2 diabetes is important because it provides new information about the molecular pathways involved in disease incidence, and potentially progression, allowing new insights into targets for intervention. A method that uses genetic variants to estimate the causal direction of effect between a modifiable exposure and an outcome, while controlling for unobserved confounders, is Mendelian randomisation (MR) [15].
Previous studies have investigated the causal direction of effect between DNAm and type 2 diabetes using MR. In the context of incident type 2 diabetes, MR was used to assess the causal link between 18 incident type 2 diabetes-associated CpG sites and type 2 diabetes [11]. Methylation quantitative trait loci (mQTL) were available for 16 of the 18 type 2 diabetes-associated CpG sites. There was nominal evidence for a direct causal association of cg00574958 (CPT1A) with type 2 diabetes. The authors of this study also assessed the causal effects of BMI and glycaemic traits on methylation at the 18 CpG sites associated with incident type 2 diabetes but found no evidence to support this causal pathway [11]. In the context of prevalent type 2 diabetes, a study of 232 participants with type 2 diabetes and 197 control participants from a Korean cohort identified 12 CpG sites associated with type 2 diabetes. The association between these CpG sites and metabolic traits was measured in a further 1018 individuals [7]. MR analyses revealed that there was a likely causal effect of fasting glucose on cg00574958 (CPT1A). There was no evidence to suggest a causal effect of type 2 diabetes on methylation in this study [7].
In a study using genotype as a causal anchor to assess future risk of type 2 diabetes, DNAm did not appear to be on the causal pathway between known type 2 diabetes genetic risk variants and type 2 diabetes, with the exception of KCNQ1 [16]. A further study of type 2 diabetes genetic risk variants that were also cis-mQTLs (defined as local variants <1 Mb from the DNAm site) assessed the causal pathway from methylation to type 2 diabetes using MR. This study identified CpG sites at four loci (HNF1B, KCNJ11, IGF2BP2 and WFS1) that were likely to be causally associated with future risk of type 2 diabetes [17].
While efforts have been made to estimate the causal direction of effect between DNAm and type 2 diabetes, conclusions remain sparse and inconsistent. In part, this is because previous studies have used modestly sized datasets (<1000 individuals). The first aim of this study was to comprehensively investigate the causal direction of effect between DNAm and type 2 diabetes using a bidirectional two-sample MR (2SMR) approach, selecting genetic instruments that were (1) strongly associated with the exposure; (2) associated with the outcome solely via the exposure; and (3) not related to the confounders [18]. DNAm was assessed at CpG sites identified in the largest meta-EWAS of prevalent type 2 diabetes available in European populations (340 individuals with type 2 diabetes and 3088 control participants) [6]. The second aim of this study was to determine the functional role of CpGs stratified by their likely causal direction of effect on type 2 diabetes. We hypothesised that CpGs may have different causal directions of association with type 2 diabetes and that this may indicate their specific mechanism of action in type 2 diabetes onset and progression.

Study samples
Forward 2SMR: type 2 diabetes as causal of differences in DNAm at candidate CpGs We extracted 148 genetic variants (SNPs) associated with type 2 diabetes from four DIAGRAM consortium GWAS [19] (see electronic supplementary material [ESM] Table 1). Three included multiethnic samples [20][21][22] and one included people of European ancestry [23]. We selected a SNP if it was (1) identified as an index variant in a GWAS meta-analysis; (2) found within a 99% credible set from an index variant, with an equal or higher posterior probability than the index variant; and (3) identified in the exome mapping close to a well-established locus for type 2 diabetes, with a minor allele frequency (MAF) >0.05. Two levels of significance were considered in the selection of SNPs: a genome-wide significance threshold of p<5.0×10 -8 and a locus-wide significance threshold of p<1.0×10 -5 (genome-wide complex trait analysis joint regression model). SNPs were excluded if they had incomplete risk allele data, no reported effect estimate (OR), no p value or a MAF <0.05. After quality control in MR-Base [24], the initial list of 148 type 2 diabetes SNPs was reduced to 62 SNPs (linkage disequilibrium [LD] r 2 <0.2) [25] in our outcome sample (ALSPAC-ARIES) (ESM Table 2).
We used data from ALSPAC-ARIES [26][27][28] (ESM Methods S1) to extract estimates of the association between the 62 type 2 diabetes SNPs and DNAm levels at 58 CpG sites previously associated with prevalent type 2 diabetes at p<1.0×10 -5 (six of 58 CpGs had an epigenome-wide p<1.33×10 -7 ) [6]. For this analysis, we included cross-sectional data from 1243 middle-aged participants (age range 31-75 years) irrespective of their type 2 diabetes status. We could not use the Genetics of DNA Methylation Consortium (GoDMC) [29] as our outcome sample because type 2 diabetes SNPs, or related SNPs in high LD (r 2 >0.6), were associated with CpGs of interest at p>10 -8 , which was the maximum p value threshold used by GoDMC to report summary data. Figure 1 shows the study design used to conduct the forward 2SMR analysis. Reverse 2SMR: DNAm at candidate CpGs as causal risk factors for type 2 diabetes Using GoDMC summary data [29], we retrieved 41 mQTL associated with DNAm in blood at 31 of the 58 CpG sites previously reported in a meta-EWAS of type 2 diabetes [6] (ESM Methods S2). We selected mQTL with a p<10 -8 for cis-mQTL (SNP <1 Mb from CpG site) and a p<10 -14 for trans-mQTL (defined as distal variants >1 Mb from the CpG site). In total, five of 41 mQTL identified were trans-mQTL. When there were multiple mQTL for a single CpG, we used the clump_data function in the R package TwoSampleMR to select independent SNPs (ESM Methods S3). Data for each mQTL were obtained using a fixed-effect meta-analysis, with effect estimates interpreted as a unit change in inverse normal transformed residuals of DNAm, per additional effect allele.
We verified that mQTL used in the reverse MR were independent of SNPs for type 2 diabetes in the forward MR using a LD threshold of r 2 <0.01 to avoid bias in the causal estimate due to pleiotropic effects or reverse causation [30]. The correlation between type 2 diabetes SNPs and mQTL was calculated using Ldlink (version 3, https:// ldlink. nci. nih. gov/) [31], selecting as the reference panel genetic data from European samples in the 1000 Genomes project phase 3.
As our outcome sample, we selected the two largest type 2 diabetes GWAS available [20,32]. Associations of mQTL with type 2 diabetes were extracted using the MRInstruments and TwoSampleMR R packages. When mQTL information was not available in the outcome sample, MR-Base looked for alternative SNPs with a LD r 2 >0.8 with the target mQTL SNP. After MR-Base data harmonisation, we were able to successfully extract outcome data for 39 of the initial 41 mQTL SNPs typing 30 meta-EWAS CpGs. The study design used to conduct the reverse 2SMR analysis is shown in Fig. 2

Statistical analyses
Forward 2SMR: conducting a SNP-CpG analysis in ALSPAC-ARIES We measured associations between 62 independent type 2 diabetes SNPs and 58 CpG sites previously reported in a meta-EWAS of type 2 diabetes using a standardised protocol [29]. We conducted linear regressions with the genotype for type 2 diabetes as the exposure and DNAm (inverse normal transformed residuals) at each CpG site as the outcome using an additive genetic model. Briefly, in the first stage of the analysis, we extracted complete genetic and DNAm data for 1243 middle-adults in ALSPAC-ARIES (age range 31-75 years) and performed quality control on each dataset separately (ESM Methods S4-S6). For genome-wide SNPs and CpG sites that passed the quality control, we selected only those corresponding to type 2 diabetes SNPs and meta-EWAS CpGs, respectively, and conducted regression analyses using the MatrixeQTL R package version 2.3 [33]. We tested the direct effect of the genotype for type 2 diabetes on DNAm at meta-EWAS CpGs at p<1.4×10 -5 or α=0.05/62 type 2 diabetes SNPs×58 CpGs. We interpreted effect estimates in the SNP-CpG analysis as the difference in residuals of DNAm (inverse normal transformed values), per additional risk allele for type 2 diabetes. Mendelian randomisation and MR-Base analysis MR is a statistical method used to infer causality in observational associations that has been documented in detail elsewhere [15]. For this study, we carried out a 2SMR analysis using summary data from two independent but comparable populations [34], both with moderate power. MR-Base was used to conduct causal analyses [24]. Further detail is provided in ESM Methods S3. Estimated causal effects were interpreted as the effect of type 2 diabetes on a unit increase in residuals of DNAm (forward 2SMR) or as the odds of type 2 diabetes per unit increase in residuals of DNAm (reverse 2SMR).
Determining the true direction of association For associations analysed bidirectionally, we inferred the likely causal direction using the causal effect estimate with the smallest p value, which was also consistent with the direction of association found in the observational analysis (i.e. meta-EWAS of type 2 diabetes). 'Inconclusive' associations with bidirectional MR data had a p value >0.1 in each direction of the analysis. Associations with analysed MR data that were in a single direction only were regarded as 'inconclusive, single direction'.
We used the Steiger test to confirm that the true direction of association was the one specified in the analysis. This test has its limitations [35] and it may perform better when using continuous rather than categorical exposures (i.e. more reliability when testing directionality in the reverse rather than in the forward 2SMR analysis).

Functional inspection of MR signals using publicly available databases
We classified meta-EWAS CpGs into three subgroups based on their most likely causal direction of association with type 2 diabetes: (1) type 2 diabetes causal of DNAm variation; (2) DNAm causal of type 2 diabetes risk; and (3) inconclusive direction of association. We looked for cis-eQTM associated with meta-EWAS CpGs in the BIOS QTL browser (https:// molge nis26. gcc. rug. nl/ downl oads/ biosq tlbro wser/) [36]. In addition, results from the EWAS Catalog [37] were grouped into related phenotypes [38] and tested for enrichment among the 58 meta-EWAS CpGs analysed. Enrichment of each CpG subgroup for specific phenotypes was reported using ORs and 95% CIs.
The p values in the enrichment analysis were one-sided. We considered evidence of enrichment at p<0.05 per traits analysed.

Ethical disclosure
Ethical approval for ALSPAC and its substudy ARIES was obtained from the ALSPAC Ethics and Law Committee and the Local Research Ethics Committees. Consent for use of biological samples was collected in accordance with the Human Tissue Act (2004).

Forward 2SMR: type 2 diabetes is suggestively associated with lower DNAm at cg20812370 (PBX1), previously identified in a meta-EWAS of type 2 diabetes
Summary statistics of type 2 diabetes-SNP associations are shown in ESM Table 2. Association estimates between type 2 diabetes SNPs and CpGs are presented in ESM Table 3.
Using forward 2SMR, we observed evidence of causality (adjusted p<0.001 or unadjusted p<0.05) between type 2 diabetes and lower levels of DNAm at the CpGs cg20812370 (PBX1) (p=0.002) and cg01577083 (RBFOX1) (p=0.023) (Fig. 3, ESM Figs 1 and 2, ESM Methods S7). The causal effects at both CpGs were directionally consistent with the results of the observational analysis (meta-EWAS), but absolute effect sizes were always larger in the MR analysis than in the observational analysis (Table 1). For associations at cg20812370 (PBX1) and cg01577083 (RBFOX1), there was little evidence of heterogeneity in the effect of 62 type 2 diabetes SNPs on DNAm levels based on the results of the Cochran's Q test (Q range 52.9-74.8, p value range 0.10-0.75).
Using MR-Egger, weighted mode and weighted median as sensitivity analyses, we observed four CpGs with evidence of causality with type 2 diabetes at p<0.05: the previously detected cg20812370 (PBX1) and cg01577083 (RBFOX1) plus cg20456243 (SPEG) and cg26766064 (MIR657) ( Table 2). In all cases, the magnitude and direction of the effect estimates were similar across analyses, but smaller p values were generally seen when using the weighted median than when using the weighted mode and MR-Egger regressions. The results of these sensitivity analyses were directionally consistent with estimates of the inverse variance weighted (IVW) regression and with the observational analysis (meta-EWAS).
Overall, in the forward 2SMR analysis, we found no evidence of weak instrument bias based on values of the F statistic, which ranged from 19.7 to 274.8 (strong instrument if F statistic >10).

Reverse 2SMR: elevated DNAm at cg25536676 (DHCR24) is associated with an increased risk of type 2 diabetes
Summary estimates of the association between the 39 mQTL SNPs and 30 meta-EWAS CpGs are shown in ESM Table 4. Association estimates of the 39 mQTL SNPs with type 2 diabetes are presented in ESM Table 5. Overall, none of these mQTL was directly associated with type 2 diabetes at the GWAS significance threshold (p<5.0×10 -8 ). Four  Table 1 Observational associations and estimates of causal effect between prevalent type 2 diabetes and differences in DNAm at five CpGs observed to have the strongest evidence of causality in the forward 2SMR analysis a Meta-EWAS of prevalent type 2 diabetes (n=3428) [6] Associations were considered significant at p<0.001 or α=0.05/58 CpGs analysed in the forward 2SMR analysis  Table 5).
In the reverse 2SMR analysis, we identified a strong causal effect (p<0.002 or α=0.05/30 CpGs) between cg25536676 (DHCR24) and type 2 diabetes using the Wald ratio (Table 3). For this CpG, we found an opposite direction of association between the causal and the observational analysis. Similar causal associations with type 2 diabetes were seen at cg25536676 using the IVW regression (ESM Table 6), with no evidence of heterogeneity (Cochran's Q=2. 4, p=0.30). The Steiger test suggested that the true direction of the association at cg25536676 was from DNAm to type 2 diabetes (Steiger p=3.8×10 -147 , R 2 for CpG=0.04, R 2 for type 2 diabetes=5.3×10 -5 ). We conducted MR sensitivity analysis for the association at cg25536676 (ESM Table 6), but these results may be unreliable because of the small number of mQTL used as instruments (n=3).
We observed evidence (unadjusted p<0.05) of a causal effect of DNAm on type 2 diabetes at four other CpG sites ( Table 3). Estimates of the causal and the observational analysis were consistent for most of these associations, except for the CpG in SYNM. No other MR analyses were carried out at these four CpGs as they were proxied by a single mQTL.
Overall, the results of the Steiger test confirmed that, for the top CpGs identified in the reverse 2SMR analysis, the true direction of association was from DNAm to type 2 diabetes (Steiger p value range 1.2×10 -164 -1.5×10 -94 , mean R 2 for CpGs=0.03 vs mean R 2 for type 2 diabetes=2.3×10 -5 ). A mean value of the F statistic of 20.3 suggested a low probability of obtaining biased results as a result of weak instruments. Figure 4 provides volcano and forest plots summarising the results of the reverse 2SMR analysis.

Bidirectional interrogation of associations identified in the 2SMR analysis
To determine the true direction of the associations with type 2 diabetes, we summarised the bidirectional causal effects derived from the 2SMR analysis. For the seven top CpGs with bidirectional 2SMR data (two detected in the forward 2SMR and five in the reverse 2SMR analysis) (Table 4), we confirmed that the true direction of association was the one we originally interrogated in the single-direction analysis. At cg10082515 (EIF3IP1), we observed the same direction of Table 2 Estimates of the causal effect between prevalent type 2 diabetes and difference in DNAm at five CpGs using additional MR sensitivity analyses for the forward 2SMR analysis MR-Egger: sensitivity analysis to account for horizontal pleiotropy. Weighted median and weighted mode: sensitivity analyses that allow for some instruments to be invalid, while generating unbiased causal estimates with the set of valid proxies Associations were considered significant at p<0.001 or α=0.05/58 CpGs analysed in the forward 2SMR analysis   association between the observational and the bidirectional causal analysis (Table 4). For four of the top six CpGs previously identified in a meta-EWAS of type 2 diabetes that had bidirectional 2SMR data (CpGs in TXNIP, HDAC4, SYNM and ABCG1), the observational and causal effects were consistent only at HDAC4 (cg00144180). At this CpG, the MR results suggested that the true direction of association was from DNAm to type 2 diabetes, similar to what we observed at the CpG cg16765088 in SYNM (ESM Table 7). For the CpGs cg19693031 (TXNIP) and cg06500161 (ABCG1), the results of the 2SMR analysis were inconclusive (i.e. large p values in both directions of the analysis). The CpG cg00574958 in CPT1A, previously shown as causally associated with fasting glucose [7], was not associated with type 2 diabetes in the single direction in which it was analysed (forward 2SMR: type 2 diabetes to DNAm).

In silico interrogation of the functional role of meta-EWAS CpGs analysed by bidirectional 2SMR
Results of the in silico analysis suggested that CpGs included in the bidirectional 2SMR analysis were enriched for eQTMs (OR 5.4, 95% CI 2.7, 11.1, p=1.5×10 -7 ) ( Table 5). We also interrogated the enrichment of meta-EWAS CpGs for traits included in the EWAS Catalog, categorising CpGs into three groups according to their likely causal direction of effect based on the bidirectional 2SMR analysis (see Methods, Functional inspection of MR signals using publicly available databases). Common to all groups was enrichment for ancestry/ethnicity and anthropometric and cardiometabolic traits. Unique to the first group (type 2 diabetes likely to be causal of DNAm) was enrichment for neurological and perinatal traits, while unique to the second group (DNAm likely to be causal of type 2 diabetes) was enrichment for lipid lipoproteins, alcohol and metabolites. In the third group of CpGs with an inconclusive direction of association, we found enrichment for all the above traits in addition to age and tissue type. Forest plots summarising the results of the enrichment analysis are presented in ESM Fig. 3.

Discussion
In this study we conducted a bidirectional 2SMR analysis to investigate the direction of causality between prevalent type 2 diabetes and DNAm at 58 CpGs previously identified in a meta-EWAS of type 2 diabetes among European populations.  assessing the MR and observational estimates, we attempted to infer the likely causal direction for the 30 CpGs analysed bidirectionally. Considering the consistency of the direction of causality with estimates from the observational analysis, as well as the smallest p value in the 2SMR analysis (p<0.1), we concluded that, for 15 (50%) of the 30 CpGs, DNAm was likely to be causal of type 2 diabetes and for 10 (33%) of the 30 CpGs type 2 diabetes was likely to be causal of DNAm; for the remaining five (17%) CpGs the results were inconclusive. In silico analyses showed that some of the CpGs analysed correspond to eQTMs, and that CpGs may be enriched for specific traits depending on the direction of causality predicted by the 2SMR analysis.

Findings of the forward 2SMR analysis
No associations were identified in the forward 2SMR analysis. This may be because there were power limitations of the data available for the analysis of the genotype-outcome association (i.e. ALSPAC-ARIES vs GoDMC). We could not confirm the previously reported nominal and robust associations of CPT1A methylation with type 2 diabetes [11] and fasting glucose [7] in our 2SMR analysis.

Findings of the reverse 2SMR analysis
The results of the bidirectional 2SMR analysis suggested that most associations identified relate to differences in DNAm influencing the risk of type 2 diabetes, rather than the opposite, with the most robust evidence obtained at cg25536676 (DHCR24). We found that increased levels of inverse normal transformed residuals of DNAm at this CpG were associated with a 43% higher risk of type 2 diabetes and this result was consistent in various sensitivity analyses; however, this was opposite to the result found in the observational analysis [6]. One reason for the opposite direction of effect observed between the observational meta-EWAS and the MR analysis at cg25536676 is residual confounding of the observational analysis. Potential confounding factors include BMI, lipid profile or medication status [39]. Observational analysis at other DHCR24 CpG sites has also identified an inverse relationship between methylation and incident type 2 diabetes but this was attenuated after adjusting for smoking, BMI and follow-up time [12]. We are not aware of other studies assessing the causal association between a CpG in DHCR24 and type 2 diabetes, but there is evidence of causality with LDL-cholesterol (LDL-C) levels in the same locus [40] and the direction of association  Thus, the association we found in the MR analysis between type 2 diabetes and DNAm at cg25536676 (DHCR24) may reflect an underlying effect of LDL-C on DNAm at DHCR24. No evidence of heterogeneity or pleiotropic effects was found for the association at cg25536676 (DHCR24). DNAm at cg25536676 has been previously found in association with total cholesterol [41] in whole blood and with perinatal traits such as gestational age [42] and birthweight [43] in cord blood. Other CpGs in the region of DHCR24 (cg17901584 and cg27168858) have been associated with HDL-cholesterol (HDL-C) [41], triglycerides [44], fasting insulin [45,46], BMI, HbA 1c , incident type 2 diabetes [46], statin use [39], waist circumference [47] and LDL-C [40]. Both nearby CpGs in DHCR24 were positively associated with levels of HDL-C and LDL-C, but only cg17901584 was negatively associated with the remaining cardiometabolic traits. Using causal analysis to overcome the effect of residual confounding, it was previously demonstrated that LDL-C, but not HDL-C, was causally associated with methylation at DHCR24 [40]. In addition, it was suggested that the causal direction of effect was from LDL-C to changes in DNAm at DHCR24 and not the opposite. Our MR result at the type 2 diabetes-related CpG cg25536676 is directionally consistent with the association identified with LDL-C at the nearby CpG in DHCR24. Evidence of causality has been identified between BMI (exposure) and DHCR24 DNAm (outcome) [46]. Because obesity and lipid dysregulation are well-known hallmarks of type 2 diabetes, and because DHCR24 encodes for an enzyme related to the metabolism of cholesterol (3-hydroxysterol-24 reductase) [41], an association between methylation at DHCR24 and type 2 diabetes is plausible. Thus far, evidence suggests that cg25536676 (DHCR24) may act as a causal mediator in the association between known risk factors (i.e. LDL-C) and type 2 diabetes. However, a formal two-step 2SMR analysis needs to be implemented to prove this hypothesis. For LDL-C, there is observational [48] and causal [49] evidence supporting an inverse association between circulating LDL-C levels and type 2 diabetes. This association is in the opposite direction to that proposed for a putative mediating role of cg25536676 (DHCR24) in LDL-C metabolism and type 2 diabetes. Taken together, our results suggest that cg25536676 may be a new DNAm target for the early detection and treatment of type 2 diabetes.

Functional analysis
The in silico analysis using data from the previous EWAS showed that CpGs detected as being potentially causal of type 2 diabetes may be associated with lipid lipoproteins, metabolites and anthropometric and cardiometabolic traits, all of which are involved in the onset of type 2 diabetes. For comparison, CpGs that were likely to be secondary to the effects of type 2 diabetes (in PBX1, SPEG, MIR657 and RBFOX1) based on the results of the 2SMR analysis were enriched for perinatal and neurological traits in addition to cardiometabolic and anthropometric traits. Although not conclusive, these results indicate that DNAm may be involved in the development and progression of type 2 diabetes through different mechanisms related to specific stages in the pathophysiology of the disease. We hypothesise that the observed enrichment for ancestry/ethnicity among type 2 diabetes CpG sites identified in European populations may highlight common confounders or exposures (e.g. differences in cell composition or type 2 diabetes risk factors) between populations [50].

Strengths and limitations
Our study has several strengths. First, we leveraged large existing datasets to extract genetic associations with type 2 diabetes and DNAm. Second, we analysed causality bidirectionally to disentangle the true direction of association and ruled out reverse causation at CpGs previously detected in the context of prevalent type 2 diabetes. We also conducted sensitivity analyses to validate assumptions of the MR analysis and we demonstrated that in each direction of the analysis we were unlikely to be affected by weak instrument bias. In addition, we ran in silico functional analyses to facilitate biological interpretation of the MR findings. Study limitations included the use of a smaller outcome sample in the forward 2SMR analysis. In the reverse 2SMR analysis (DNAm to type 2 diabetes), we had few proxies to predict DNAm at the CpGs of interest (~1 mQTL/ CpG). This prevented us from obtaining stronger evidence of causality and conducting MR sensitivity analyses. Because up to 9.2% overlap was present between the exposure and the outcome samples in the reverse 2SMR analysis, the results need to be interpreted with caution because of potential bias. CpGs were selected from an observational EWAS of type 2 diabetes in European populations, which may limit the generalisability of the results to other population groups. Use of blood DNA rather than metabolically active tissue samples may limit interpretability. Finally, the validity of the observational and causal associations at DHCR24 need to be confirmed before this evidence can be clinically implemented in the early detection and treatment of type 2 diabetes. Future studies would benefit from the use of multiethnic cohorts throughout the observational and causal inference stages of the analysis. Similarly, a twostep 2SMR analysis may be required to investigate the role of DNAm as a potentially causal mediator between known risk factors and type 2 diabetes.

Conclusions
This study assessed causality between DNAm and type 2 diabetes in a bidirectional 2SMR framework at CpG sites identified from a meta-EWAS of prevalent disease. We had more power to identify causality from DNAm to type 2 diabetes than in the reverse direction, for which PBX1 was the signal most likely to be associated with type 2 diabetes. We demonstrated that a CpG at DHCR24, a gene related to the metabolism of lipids, was causally associated with type 2 diabetes. Further validation of causal associations is needed using larger and better-powered samples to extract genetic associations, especially when DNAm is the exposure.