DNA methylation signatures associated with cardiometabolic risk factors in children from India and The Gambia: results from the EMPHASIS study

The prevalence of cardiometabolic disease (CMD) is rising globally, with environmentally induced epigenetic changes suggested to play a role. Few studies have investigated epigenetic associations with CMD risk factors in children from low- and middle-income countries. We sought to identify associations between DNA methylation (DNAm) and CMD risk factors in children from India and The Gambia. Using the Illumina Infinium HumanMethylation 850 K Beadchip array, we interrogated DNAm in 293 Gambian (7–9 years) and 698 Indian (5–7 years) children. We identified differentially methylated CpGs (dmCpGs) associated with systolic blood pressure, fasting insulin, triglycerides and LDL-Cholesterol in the Gambian children; and with insulin sensitivity, insulinogenic index and HDL-Cholesterol in the Indian children. There was no overlap of the dmCpGs between the cohorts. Meta-analysis identified dmCpGs associated with insulin secretion and pulse pressure that were different from cohort-specific dmCpGs. Several differentially methylated regions were associated with diastolic blood pressure, insulin sensitivity and fasting glucose, but these did not overlap with the dmCpGs. We identified significant cis-methQTLs at three LDL-Cholesterol-associated dmCpGs in Gambians; however, methylation did not mediate genotype effects on the CMD outcomes. This study identified cardiometabolic biomarkers associated with differential DNAm in Indian and Gambian children. Most associations were cohort specific, potentially reflecting environmental and ethnic differences.

prevalence has rapidly increased in low-and middleincome countries (LMICs) [1,2]. Globally, the prevalence of type 2 diabetes has increased between 1980 and 2014 [3], but at higher rates in LMICs [1]. The prevalence of childhood hypertension in Central India is reported to be 6.8-7.0% [4] rising to 9.5% in Chennai [5], compared to 4% globally. Furthermore, mortality occurs earlier in LMICs, with the number of years spent living with these conditions increasing [2,3], escalating the societal and individual health burden. The rapid rise in CMD cannot be explained solely by fixed genetic factors, but suggests that environmental factors may contribute, including a change from traditional to western diets, increased intake of processed foods, urbanisation and reduced physical activity [6,7]. Moreover, there is substantial evidence that early life environmental exposures during critical developmental windows modulate CMD risk [8,9]. Exposure to persistent undernutrition, poor quality diets and a high burden of infectious diseases in utero and in early childhood are suggested to induce metabolic adaptations to aid survival. However, these adaptations may be detrimental in later life, limiting metabolic capacity in response to an obesogenic environment [10]. The early onset of cardiovascular and metabolic conditions in adults from LMICs compared to HICs may reflect such adverse early life adaptations.
The environment can influence phenotype through epigenetic processes. The most widely studied epigenetic mechanism is DNA methylation (DNAm), with evidence from both human and animal studies linking environmental exposures to DNAm and metabolic changes and altered CMD risk susceptibility [11][12][13]. In humans, candidate gene and epigenome-wide association studies (EWAS) have identified robust associations between DNAm and CMD traits in adulthood, which have been replicated across cohorts [14][15][16]. However, EWAS have primarily been carried out in HIC cohorts, with limited analysis of individuals from LMICs. As DNAm is influenced by both the environment and genotype [17], the extent to which methylation markers of CMD traits from HIC can be extrapolated to LMICs is unknown. Moreover, previous EWAS have focussed on CMD-associated DNAm changes in adults. Limited studies have examined DNAm in children, where the influence of early life environmental exposures may be stronger, with the possibility to detect methylation signatures associated with sub-clinical changes in metabolic function before disease onset.
In this study, we analysed DNAm in children from the EMPHASIS study [18] (Epigenetic Mechanisms linking Pre-conceptional nutrition and Health ASsessed in India and sub-Saharan Africa; ISRCTN14266771) which includes two LMIC cohorts, one each from India and The Gambia.
Previously, we investigated the effect of maternal micronutrient supplementation on DNAm in their children [19]. Here, we sought to investigate associations of DNAm with cardiometabolic risk markers in children from each cohort, and in both cohorts combined through a meta-analysis. We also examined the potential influence of genetic variants and maternal micronutrient intervention at associated loci.

Cohort characteristics
The characteristics of the children in the two cohorts are summarised in Table 1 and stratified by sex in Additional file 1: Table S1. There were 289 Gambian children (53.6% male), with a median age of 9.0 years and 686 Indian children (55.1% male), with a median age of 5.8 years. Mean blood pressure (systolic, diastolic and pulse pressure) was generally lower in the Indian children compared to the Gambian children. The Indian children also showed lower fasting, 30-min and 120-min glucose levels during an OGTT, whereas the Gambian children showed lower fasting insulin levels. Triglyceride and LDL levels were higher in the Indian children, whereas HDL levels were higher in the Gambian children.

Gambian EWAS
DNA methylation was examined using the Illumina Infinium HumanMethylation 850 K Beadchip array in peripheral blood samples from the Gambian children, and robust linear regression used to identify associations between DNA methylation and concurrent cardiometabolic risk factors. A full list of significant dmCpGs (FDR < 0.05) can be found in Table 2, alongside equivalent statistics from the Indian cohort. There were no significant sex interactions with the dmCpGs identified in the Gambian children. Further details are described below.

Glucose levels
There were no significant associations of DNAm with children's fasting, 30-min or 120-min glucose levels.

Insulin levels
There were two dmCpGs associated with fasting insulin levels: cg22388948 in the body of the Family with Sequence Similarity 46 Member A (FAM46A) gene (FDR = 0.014); and cg13934266 in the body of the LysM Domain Containing 2 (LYSMD2) gene (FDR = 0.022). A 1 pmol/l increase in fasting insulin was associated with a 1.37% increase in methylation of cg22388948 (95% CI = 0.81, 1.94) and with a 0.13% decrease in methylation of cg13934266 (95% CI = -0.18, -0.09). These dmCpGs were not associated with fasting insulin levels in the Indian children. There were no significant associations with insulin sensitivity, or insulinogenic index, unadjusted or adjusted for HOMA2-S, in the Gambian children.

Lipid levels
DNAm at CpG cg15237100 located in an intergenic region on chromosome 15 was associated with triglyceride levels (FDR = 0.031). A 0.1 mmol/l increase in triglycerides was associated with a 4.4% decrease in methylation of cg15237100 (95% CI = −0.63, −0.25). There were six dmCpGs associated with LDL-Cholesterol levels. The two most significant dmCpGs were cg01469688 in the promoter of the Suppressor Of Cancer Cell Invasion (SCAI) gene (FDR = 0.004), and cg06952751 in the promoter of the C18orf8 gene (FDR = 0.004), with a 0.1 mmol/l increase in LDL-Cholesterol associated with a 1.3% and 3.8% increase in methylation of cg01469688 and  Table 2 lists all the significant dmCpGs from the Indian EWAS alongside equivalent statistics in the Gambian cohort. Some evidence of significant sex interactions was identified in the Indian children. Further details are described below.

Blood pressure
There were no significant associations between DNAm and SBP, DBP or PP in the Indian cohort.

Glucose levels
No significant associations were detected between DNAm and child's fasting, 30-min or 120-min glucose levels.

Insulin levels
There were no associations between DNAm and child's fasting insulin and insulin 30-min after an OGTT. The CpG cg10304969, in the body of the Transmembrane protein 57 (TMEM57) gene, was associated with insulin sensitivity (FDR = 0.018), where a 1 unit decrease in HOMA2-S was associated with a 0.73% decrease in methylation (95% CI = −0.98, −0.049). cg22982428, in an intergenic region on chromosome 2, was associated with insulinogenic index adjusted for HOMA2-S (FDR = 0.022), where a 1 unit increase was associated with a 0.3% increase in methylation (95% CI = 0.19, 0.41). These CpGs were not significant in the Gambian cohort. There were no dmCpGs associated with the insulinogenic index in the Indian cohort.

Lipid levels
There were no significant associations with triglyceride or LDL-Cholesterol levels. One CpG was associated with HDL-Cholesterol levels (Fig. 2a); cg04988216 in the body of the Receptor Tyrosine Kinase Like Orphan Receptor 1 (ROR1) gene (FDR = 0.019, Fig. 2b), where a 1 mmol/l increase in HDL-C was associated with a 0.96% decrease in methylation (95% CI = −1.27, −0.64). Figure 2b suggests this association may be influenced by two outlier samples with a lower methylation beta value relative to the rest of the samples. The Table 2 List of significant differentially methylated CpGs associated with various CMD markers The gene name corresponds to the nearest gene as annotated in the Illumina EPIC array manifest. CMD = cardiometabolic disease; Coef = regression beta coefficient; FDR = Benjamini-Hochberg false discovery rate; LDL-C = Low-Density Lipoprotein Cholesterol; HOMA2-S = Insulin Sensitivity; HDL-C = High-Density Lipoprotein cholesterol; dmCpG = differentially methylated CpG. Regression coefficients were calculated using beta values in the regression models for the dmCpGs to obtain interpretable coefficients *Denotes the association between the CpG and outcome was confounded by genotype association was more significant (FDR = 0.003, Fig. 2c) after removal of these two samples. Furthermore, methylation at cg04988216 in ROR1 gene showed a significant interaction with sex (p = 2.37 × 10 -3 , Fig. 2d), with methylation significantly associated with HDL-Cholesterol levels in the females (p = 2.67 × 10 -6 ) but not the males (p = 0.24). The CpG cg04988216 was not significant in the Gambian children.

Differentially methylated regions
To identify regional differences in DNA methylation, DMRcate was used to identify DMRs associated with the CMD markers in the children (Table 3). In the Gambian children, there was one 69 bp DMR comprising three CpGs associated with DBP (Stouffer < 0.05), located 1 kb upstream from the transcriptional start site of the Ephrin A1 (EFNA1) gene. One 576 bp DMR comprising nine CpGs was associated with fasting insulin levels, located in the promoter of the C8orf31 gene. In the Indian children, one 119 bp DMR consisting of two CpGs in an intergenic region on chromosome 20 was associated with fasting glucose. Eight DMRs were significantly associated with HOMA2-S, with the top DMR comprising two CpGs located in a 13 bp intergenic region on chromosome 22. Of the DMRs identified, there were no overlaps between the cohorts. Furthermore, no DMRs were in close proximity to identified dmCpGs.

Meta-analysis
A meta-analysis was carried out to identify associations between DNA methylation and cardiometabolic outcomes common across both cohorts (Table 4). There were two dmCpGs significantly associated with the insulinogenic index (Fig. 3a, b). These were: cg04859490 (FDR = 0.029, HetI 2 = 0, Fig. 3c) in intron 8 of the Carboxypeptidase A4 (CPA4) gene; and cg00363845 (FDR = 0.029, Het I 2 = 51.7, Fig. 3d) in intron 1 of the GTP binding protein 3 (GTPBP3) gene. Figure 3c suggests that cg04859490 may be influenced by genotype in the Indian cohort, and we did find evidence of methQTL effects in trans at this locus, although these were not genome-wide significant (Additional file 1: Table S2). In a sensitivity analysis, inclusion of genotype at these nominally associated SNPs into the regression models did not influence the effect size of the association between cg04859590 and the insulinogenic index in the Indian children (Additional file 1: Table S3). One dmCpG was found to be significantly associated with pulse pressure in the meta-analysis: cg14997376 (Het I 2 = 0) in exon 39 of the Citron Rho-interacting kinase (CIT) gene.

Links to early environment and preconception nutritional supplementation
Various environmental exposures in early life have been linked to DNAm changes in children [13,20,21]. The mothers in both cohorts received a nutritional intervention pre-and during pregnancy [19]. Therefore, we examined whether the identified dmCpGs were associated with this intervention. None of the dmCpGs were associated with the intervention. Season of conception (SoC) is associated with DNA methylation signatures in Gambian children [22]. In the Gambian cohort, inclusion of SoC as a covariate did not influence the significance of the primary associations between CMD markers and DNAm at the dmCpGs, and no significant interaction between SoC and DNAm was observed.

methQTL analysis
As DNA methylation at specific loci can be influenced by genotype, we used GEM to identify methylation quantitative trait loci (meQTLs) associated with the cardiometabolic dmCpGs identified in the two cohorts. For the Indian cohort, we analysed associations between the 15 CMD identified dmCpGs and 4,312,147 SNPs. No significant methQTLs were identified in cis-or trans-with a Bonferroni threshold of 2.4 × 10 -8 and 2.8 × 10 -9 , respectively (see Methods for further details). For the Gambian cohort, we analysed associations between the 15 dmCpGs and 4,555,414 SNPs. We identified a total of 79 cis-methQTLs using a Bonferroni threshold of p = 2.4 × 10 -8 (Additional file 1: Table S4): 44 methQTLs associated with cg00368636, 14 with cg13135286 and 21 with cg13819288. No trans-methQTLs were identified using a Bonferroni threshold p = 2.8 × 10 -9 . A sensitivity analysis demonstrated that effect sizes reported in the main analysis for cg00368636 and cg13819288 were not significantly affected by cis-methQTL effects, while the effect size for the LDL-cholesterol associated cg13135286 was significantly reduced after adjustment for the multiple cis-methQTLs (Additional file 1: Table S5). Additional investigation revealed that seven of the fourteen identified cg13135286-associated methQTLs were directly associated with LDL-cholesterol (Additional file 1: Table S6). We conducted further analyses to investigate the effect of these methQTLs on the reported dmCpG association (Additional file 4: Figure S3A) and observed that the methQTLs clustered into three linkage disequilibrium (LD) blocks (Additional file 1: Table S7, Additional file 4: Figure S3B, C). A mediation analysis was carried out to determine if the genotype associations with LDL-Cholesterol levels were mediated by methylation at cg13135286. We found no evidence that cg13135286 mediated the effect of the three tag SNPs (rs75332983, rs614038 and rs6659203) on LDL-Cholesterol levels (Additional file 1: Table S8).

Discussion
In this study, we report findings from the EMPHA-SIS study, investigating associations between DNAm in children from two LMICs with concurrent measures of cardiometabolic risk factors. We identified novel methylation changes associated with CMD risk factors including blood pressure, insulin sensitivity and lipids at both single CpG and regional levels in the individual cohorts, and common methylation changes associated with insulin secretion and pulse pressure in a meta-analysis. These findings may provide insights into molecular pathways associated with CMD in two understudied LMIC populations. EWAS analysis identified associations between DNAm and blood pressure. The meta-analysis identified a dmCpG within the CIT gene associated with pulse pressure, a measure of arterial stiffness. Furthermore, in the Gambian cohort, we found significant associations between CpGs associated with the MED22 and EGLN2 genes with SBP; and a DMR, comprising three CpGs, within the EFNA1 gene associated with DBP. While MED22 has not previously been linked to vascular function, EGLN2 is involved in regulating hypoxia tolerance and apoptosis in cardiac and skeletal muscle [23]. EFNA1, an EPH receptor protein-tyrosine kinase, is highly expressed in vascular smooth muscle, triggering EPHA4 signalling and stress fibre assembly [24]. However, no studies have linked the genetic/epigenetic regulation of EFNA1 with blood pressure. In adults, associations between DNAm and blood pressure have been reported in several cohorts [25,26], in which trans-ethnic differences were identified [25]. However, the dmCpGs identified here did not overlap with those reported by Kazmi et al. [25] in Europeans and South Asian men or those reported in a meta-analysis of multiple African American cohorts [26].
The insulinogenic index is a measure of first-phase insulin secretion. In the meta-analysis, there were associations between CpGs within the CPA4 and GTPBP3 genes and insulinogenic index. Carboxypeptidase A4 (CPA4) is an exopeptidase, negatively regulating adipogenesis and downregulated during adipocyte differentiation by FGF1 [27]. CPA4 expression in adipose tissue is inversely correlated with insulin sensitivity, implicating CPA4 in maintaining local and systemic insulin sensitivity [27]. Canonically, DNA methylation across the promoter represses gene expression, while gene body methylation is generally positively associated with expression [28]. This suggests that higher methylation at this CpG may increase CPA4 expression, resulting in reduced insulin sensitivity driving increased insulin secretion in children with good pancreatic reserve, consistent with the findings reported by He et al. [27]. This is supported by the absence of association with insulin secretion adjusted for insulin sensitivity. GTPBP3 is involved in mitochondrial tRNA modification, with decreased expression associated with reduced oxygen consumption and ATP production [29]. As increased ATP is necessary for the membranedependent increases in cytosolic Ca 2+ , the main trigger of insulin exocytosis [30], altered epigenetic regulation of GTPBP3 may have downstream effects on insulin secretion. Furthermore, we found associations between DNAm and insulin measures in the individual cohorts. In the Gambian cohort, two dmCpGs were associated with fasting insulin levels, while in the Indian cohort there were associations with HOMA2-S, insulinogenic index adjusted for HOMA2-S and fasting glucose, with no overlap between the cohorts. Many of the dmCpGs and DMRs were located within intergenic regions, so their potential influence on gene expression and on insulin/ glucose homeostasis is currently unclear.
Although there were no associations between DNAm and lipid levels in the meta-analysis, there were associations in the individual cohorts. Six dmCpGs were significantly associated with LDL-Cholesterol levels in the Gambian children, of which cg01469688, located in the promoter of the SCAI gene, was the most significant. SCAI is a transcriptional modulator regulating myocardin, implicated in cardiac hypertrophy [31] and hypertension [32]. There are many possible interpretations of this observation, including that LDL-Cholesterol may influence the epigenetic regulation of this cardiac transcriptional regulator, contributing to the development of CVD. Alternatively, SCAI transcription could influence methylation, with the dmCpG serving as a biomarker of cardiovascular stress associated with LDL-Cholesterol levels. In the Indian cohort, there were no associations of DNAm with triglycerides or LDL-Cholesterol levels. However, cg04988216 within the body of the ROR1 gene was negatively correlated with HDL-Cholesterol levels. ROR1 plays an essential role in skeletal and cardiac development [33]. Moreover, Sánchez-Solana et al. [34] have shown that inhibition of ROR1 modulates ERK1/2 activity in mice, regulating the expression of glucose transporters 1 and 4. Decreased gene body methylation within ROR1 may indicate decreased expression of ROR1, subsequently promoting increased adipogenesis in those with low HDL-Cholesterol levels, affecting susceptibility to later life metabolic disease.
In adults, previous EWAS have identified robust associations between CMD risk markers and DNAm at key genes in lipid metabolism [35]. Moreover, several of these are associated with increased CMD incidence. For example, CpGs within CPT1A were associated with the metabolic syndrome [36] and plasma adiponectin, a biomarker for CMD/CVD risk [37]. A recent Mendelian randomisation study has suggested a causal effect of methylation at ABCG1 on BMI and lipid levels [14]. We did not find associations at these previously reported CpGs in our study, possibly because we measured DNAm in children without overt indications of CMD. Furthermore, the range of physiological and biochemical measures in children is smaller and presumably subject to tighter metabolic homeostasis than in adults, due to the absence of comorbidities. Moreover, the children here are from LMICs where there have been limited epigenetic studies, and differences in genotype and environment may contribute to DNAm differences associated with CMD traits.
The marked differences observed between the results in the two cohorts could reflect potential populationspecific phenotypic differences. Studies have shown that Indians have greater body fat and central obesity compared to black African-Caribbean populations [38,39]. This is reflected in higher plasma non-esterified fatty acids and triglycerides, hyperinsulinemia and IR during fasting and post-glucose challenge in the Indian population [38]. DNAm differences between the cohorts could influence, or be influenced by, the different distributions of these cardiometabolic markers, and may mark corresponding differences in IR progression.
Differences in DNAm between the populations may also result from differences in diet, environment and/or genotype. The Indian children were living in overcrowded urban slums with high levels of air pollution, which may affect the methylome [40]. In contrast, the Gambian children are from a remote rural area where the food supply is heavily season dependent [22]. However, the dmCpGs identified here were not associated with SoC or maternal pre-conceptional and pregnancy micronutrient intervention [19]. Additionally, we found limited evidence that the dmCpGs were influenced by measured genetic variation, with only three dmCpGs in the Gambian analysis having significantly associated methQTLs. We also found no evidence that methylation mediates an effect of genotype at a single CpG in Gambians where both the CpG and methQTLs were associated with LDL-Cholesterol. We note that differences in power due to the varied sample sizes of the two cohorts are unlikely to underlie the contrasting findings since effect sizes and p values are markedly different for dmCpGs identified in one cohort or the other ( Table 2).
There are several strengths to this study. Firstly, we were able to analyse an extensive set of blood-derived markers and phenotype measures on the children, allowing detailed assessment of the relationship between DNAm and CMD markers in childhood. Secondly, investigating associations in children from two LMICs gives an opportunity to assess methylation changes in two understudied populations and for inter-cohort comparison. Although replication in HIC cohorts is possible, environmental and lifestyle differences between LMICs and HICs would be likely to confound the results. Limitations of the study include the lack of suitable cohorts from LMICs with both methylation data and CMD measures in childhood. Furthermore, DNAm was measured in peripheral blood, which has limited relevance to the aetiology of CMD traits. We also have no earlier measures of phenotypes or DNAm in the children, so cannot investigate temporal relationships. However, while we found no evidence of causal relationships, methylation changes presented here, if replicated, might serve as useful biomarkers for identifying individuals at increased risk of CMD.

Conclusions
We carried out a comprehensive analysis of the relationship between concurrent DNA methylation in peripheral blood and measures of cardiometabolic health in children from two LMICs. We identify both cohort-specific and common associations. With further replication, identified methylation changes during early childhood may serve as biomarkers of future CMD risk and may provide insights into molecular pathways leading to CMD in later life.

Study cohorts
The EMPHASIS study [18,19] comprises two cohorts of children born to mothers who took part in separate randomised controlled trials of nutritional supplementation before and during pregnancy. The original trials were: the Mumbai Maternal Nutrition Project (MMNP) (also known as project SARAS-ISRCTN62811278) among women living in slums in the city of Mumbai, India; and the Peri-conceptional Multiple Micronutrient Supplementation Trial (PMMST-ISRCTN13687662) among women living in rural West Kiang, The Gambia. The Mumbai children have been followed up at 5-7 years of age ("SARAS KIDS" study) and data and samples for the first 700 children studied from the per protocol group (children whose mothers started supplementation at least 3 months prior to conception) [41] have been used in the EMPHASIS study. The Gambian children were followed up aged 7-9 years in 2016; all 299 Gambian children retraced from the PMMST group were included in the study. The current investigation is a part of the Stage 2 analysis of the EMPHASIS study [18].

Physiological and biochemical measurements
Full details of the physiological and biochemical procedures carried out are described in Additional file 5: Methods. Briefly, blood pressure was measured using an Omron HEM 7080 and Omron 705IT in the Gambian and Indian children, respectively. In both cohorts, fasting blood samples were collected. For the oral glucose tolerance test (OGTT), an oral anhydrous glucose load of 1.75 g/kg body weight was administered, after which blood samples were collected at 30 and 120 min. In the Gambian cohort, glucose and lipid concentrations were measured using the COBAS INTEGRA ® 400 plus analyser (Roche Diagnostics, USA). Insulin was measured using the VITROS 350 Analyzer (Ortho Clinical Diagnostics, USA). In the Indian cohort, plasma glucose concentrations were measured using standard enzymatic methods; insulin using ELISA kits (Mercodia, Sweden); and lipids (HDL-/LDL-Cholesterol and triglycerides) using ready-to-use kits (Dialab, Austria).

Derived variables
Insulin sensitivity (HOMA2-S) was derived from fasting glucose and insulin using the Oxford calculator (https:// www. phc. ox. ac. uk/ resea rch/ techn ology-outpu ts/ ihoma2). Two measures of first-phase insulin secretion were derived: the Insulinogenic index [42] and the Insulinogenic index adjusted for insulin sensitivity, calculated as the residual of insulinogenic index regressed on HOMA2-S. See Additional file 5: Methods for further details.

Processing of outcome variables
Distributions of physiological variables were checked and log-transformed if necessary. Residuals were generated for all physiological variables by adjusting for the child's age, sex and current height and body mass index (BMI) where such associations were statistically significant and used in the final regression models. Further details are in Additional file 5: Methods.
Epigenome-wide DNA methylation quality control (QC) and pre-processing DNA was extracted from peripheral blood samples as previously described [19]. Epigenome-wide DNA methylation profiling was performed for a total of 698 Indian and 293 Gambian samples using the Human Meth-ylationEPIC BeadChip platform (Illumina, USA). Full details of QC and pre-processing are described in Saffari et al. [19]. Briefly, the raw.idat files were processed in R (v3.5.2) using the Bioconductor package meffil [43]. Sex mismatches (5 Indian, 0 Gambian samples) and outlying arrays (7 Indian, 4 Gambian samples) were excluded. Probes with a low detection p value or bead numbers (1494 and 2635 probes in Indian and Gambian data, respectively), mapping to sex chromosomes and/or cross-reactive (61,523 and 61,225 probes in Indian and Gambian data, respectively) were excluded. After preprocessing and QC, this left 686 samples and 803,120 probes in the Indian cohort, and 289 samples and 802,283 probes in the Gambian cohort.

Site-level differential methylation analysis
For EWAS analysis, robust regression models were run using limma (v3.38.3) [44] with methylation M values as the outcome variable due to their superior distributional properties for linear regression modelling in differential methylation analysis [45]. Models were adjusted for child's sex, age at measurement and the first ten principle components from a methylation principle component analysis (PCA) derived from the 200,000 most variable probes to account for technical covariates and white blood cell composition [19]. The analysis was controlled for multiple testing with the Benjamini-Hochberg adjustment for false discovery rate, with a significance threshold of an FDR < 0.05. Inflation of p values was assessed (lambda), Quantile-Quantile (Q-Q) plots generated and bacon (v1.10.1) [46] used to control for genomic inflation of test statistics where lambda > 1.2. Effect sizes for site-level analysis used methylation beta values to aid interpretation [45]. An additional investigation of interactions between methylation and sex was carried out for significant differentially methylated CpGs (dmCpGs) only.

Regional-level differential methylation analysis
The methylation status of adjacent CpG sites can be highly correlated, often with functional relevance and analysis of regional changes in methylation can provide increased statistical power. DMRcate (v1.18.0) [47] was used for the identification of differentially methylated regions (DMRs) with respect to the different measures of cardiometabolic health. DMRcate uses Stouffer's method for combining p values for CpGs, with a Stouffer < 0.05 threshold being used as statistically significant [19].

Sensitivity analyses
To identify sex dependent methylation effects, we investigated potential sex-interaction effects (outcome x child sex) on the significant dmCpGs identified in the main analysis using the following regression model: Furthermore, it has previously been reported that season of conception in The Gambia influences infant's DNA methylation and early growth. Therefore, the effect of season of conception (SoC) on the outcome-associated loci was tested by inclusion as a covariate in the main regression model and secondly, by inclusion of an interaction between DNA methylation and SoC as below: A Bonferroni-adjusted significant threshold: p < 2.8 × 10 -3 (0.05/18) was used to adjust for multiple testing of the 18 dmCpGs investigated for sex interactions.

Meta-analysis
To examine common associations across cohorts, effect size estimates from individual EWAS were meta-analysed using METAL [48] with an inverse variance weighting. Correction for inflation of both input and meta-analysis output statistics was performed using double genomic control (GC). We explored heterogeneity between the two cohorts using the I 2 statistic [49,50].

Genotyping
SNP genotypes for 293 Gambian and 698 Indian samples were generated using the Infinium Global Screening Array-24 v1.0 BeadChip array (Illumina, USA) and imputed against a 1000 Genomes Phase 3 reference panel using IMPUTE2 (v2.3.2). Full details of QC and pre-processing can be found in Saffari et al. [19] and in Additional file 5: Methods. The final imputed data sets comprised 284 samples with 4,555,414 SNPs in the Gambian cohort, and 686 samples with 4,312,147 SNPs in the Indian cohort.

Methylation quantitative trait loci (methQTL) analysis
methQTL analysis was carried out by the GEM package (v1.10.0), using an additive model [17]. The analysis was restricted to the significant dmCpGs identified in single cohort and/or meta-analyses. Separate analyses in cis (SNP ± 2 Mb from the CpG) and trans (all other SNPs) were conducted to maximise power. Significant cis-methQTLs were those with a Bonferroni-adjusted p value < 2.4 × 10 -8 , while trans-methQTLs were those with a Bonferroni-adjusted p < 2.8 × 10 -9 . Full details are described in Additional file 5: Methods.
In order to minimise non-genetic variation in the DNA methylation data, we regressed out the effect of child sex, plus the first 10 principal components (PCs) from an unsupervised PCA of the methylation data, prior to performing the methQTL analysis. The resulting methylation residuals were then rank transformed and centred to have mean 0 and variance 1 [51]. The methQTL analysis was carried out using the GEM package (v1.10.0) from Bioconductor which uses an additive (allelic dose) model to test for SNP-methylation associations [17]. This analysis was restricted to all 18 CpGs identified in single cohort and/or meta-analyses.
Separate analyses in cis (SNP ± 2 Mb from the CpG) and trans (all other SNPs) were conducted to maximise power. Significant cis-methQTLs were those with a Bonferroni-adjusted p value < 2.4 × 10 -8 , while trans-methQTLs were those with a Bonferroni-adjusted p < 2.8 × 10 -9 .
i. Identification of cis-methQTL Only SNPs within + / − 2 MB of a CpG of interest were considered for this analysis. Significant cis-methQTL are those with an association p value passing a Bonferroni-adjusted significance threshold of p = 0.05/n_ SNPs/n_CpGs, where n_SNPs is the number of cis-SNPs and n_CpGs is the total number of CpGs tested (n = 18 CpGs).
ii. Identification of trans-methQTL trans-methQTL are methQTL with an association p value passing a genome-wide Bonferroni-adjusted threshold of p = 5 × 10 -8 /n_CpGs that do not fall within the set of cis-methQTL identified in i) above.
For CpGs with significant methQTLs, we conducted a sensitivity analysis to see if methQTL effects significantly changed effect sizes for the CpG-outcome associations reported in the main EWAS analysis. We did this by repeating the main analysis with the methQTL-SNP as an additional covariate and comparing original effect size estimates with 95% confidence intervals for the effect size in the adjusted model.
We also investigated associations between identified methQTLs and the traits associated with the methQTLassociated CpG identified in the main EWAS, using