Identification of blood autosomal cis-expression quantitative trait methylation (cis-eQTMs) in children

Background The identification of expression quantitative trait methylation (eQTMs), defined as correlations between gene expression and DNA methylation levels, might help the biological interpretation of epigenome-wide association studies (EWAS). We aimed to identify autosomal cis-eQTMs in child blood, using data from 832 children of the Human Early Life Exposome (HELIX) project. Methods Blood DNA methylation and gene expression were measured with the Illumina 450K and the Affymetrix HTA v2 arrays, respectively. The relationship between methylation levels and expression of nearby genes (transcription start site (TSS) within a window of 1 Mb) was assessed by fitting 13.6 M linear regressions adjusting for sex, age, and cohort. Results We identified 63,831 autosomal cis-eQTMs, representing 35,228 unique CpGs and 11,071 unique transcript clusters (TCs, genes). 74.3% of these cis-eQTMs were located at <250 kb, 60.0% showed an inverse relationship and 23.9% had at least one genetic variant associated with the methylation and expression levels. They were enriched for active blood regulatory regions. Adjusting for cellular composition decreased the number of cis-eQTMs to 37.7%, suggesting that some of them were cell type-specific. The overlap of child blood cis-eQTMs with those described in adults was small, and child and adult shared cis-eQTMs tended to be proximal to the TSS, enriched for genetic variants and with lower cell type specificity. Only half of the cis-eQTMs could be captured through annotation to the closest gene. Conclusions This catalogue of blood autosomal cis-eQTMs in children can help the biological interpretation of EWAS findings, and is publicly available at https://helixomics.isglobal.org/.


Background
Cells from the same individual, although sharing the same genome sequence, differentiate into diverse lineages that finally give place to specific cell types with unique functions. This is orchestrated by the epigenome, which regulates gene expression in a cell/tissue and timespecific manner [1][2][3]. Besides the central role of the epigenome in regulating embryonic and fetal development, X-chromosome inactivation, genomic imprinting, and silencing of repetitive DNA elements, it is also responsible for the plasticity and cellular memory in response to environmental perturbations [1][2][3]. When this happens during prenatal or early life, the re-programmed epigenome might not match the later environment, and this can lead to increased disease risk. This is known as the Developmental Origins of Health and Disease (DoHAD) [4]. In addition to the plasticity or re-programming hypothesis, environmental insults experienced during prenatal life may directly disrupt the correct development of organs without a homeostatic response, which might also have long-term consequences on health [4].
Massive epigenetic alterations, caused by somatic mutations or as a result of age and/or injury, were initially described in cancer [3]. The paradigm of environmental factors modifying the epigenome and leading to increased disease risk was then extrapolated from cancer to a wide range of common diseases. As a consequence, a high number of epigenome-wide association studies (EWAS) investigating the association between prenatal and postnatal exposure to environmental factors and DNA methylation, and between DNA methylation and disease [1,3] have been performed in he recent years. EWAS have found thousands of associations between DNA methylation and environmental exposures or disease, which have been inventoried in two catalogues: the EWAS catalogue [5] and the EWAS Atlas [6].
For instance, the latter includes 0.5 million associations for 498 traits from 1,216 studies, including 155 different cells/tissues.
Despite the success of EWAS in identifying altered methylation patterns, the role of genetic background, access to the target tissue/cell, confounding, reverse causation and biological interpretation are still challenging issues [1,3]. Regarding the latter, most studies do not dispose of transcriptional data to test the consequences of DNA methylation changes in gene expression. When gene expression is not available, a common approach is to assume that CpGs affect the expression of the closest gene [7]. Although this approach is easy to implement, it is limited in the fact that CpGs might regulate distant genes or might not regulate any gene at all [1,8]. Another approach to interpreting findings is to search for regulated genes of candidate CpGs in expression quantitative trait methylation (eQTM) studies, i.e. genome-wide studies investigating the associations between DNA methylation levels and gene expression [9,10]. Several eQTM studies have been published in diverse cell types/tissues: whole blood [8,11], monocytes [11][12][13], lymphoblastoid cell lines, T-cells and fibroblasts derived from umbilical cords [14,15], fibroblasts [16], liver [17], skeletal muscle [18], nasal airway epithelium [19] and placenta [20]. As most of the EWAS are conducted in whole blood [6,21], there is a need for comprehensive eQTM studies in this tissue. Available whole blood eQTM studies to date only cover samples from adults [8,11] and their validity in children has not been assessed. Besides, they do not consider the effect of genetics and blood cellular composition.
In this study, we analyzed DNA methylation and gene expression data from the Human Early-Life Exposome (HELIX) project to generate the first blood autosomal cis-eQTM catalogue in children (https://helixomics.isglobal.org/). We characterized child blood autosomal cis-eQTMs at the molecular level, compared them with eQTMs identified in adults, analyzed the proportion of cis-eQTM CpG-gene pairs captured through annotation to the closest gene, and assessed the influence of genetic variation and blood cell type specificity on the association between methylation and gene expression in eQTMs. An 8 overview of all the analyses can be found in Figure 1. This public resource will help the functional interpretation of EWAS findings in children. Figure 1. Analysis workflow. The figure summarizes the analyses conducted in this study. The first step was the identification of blood autosomal cis-eQTMs (1 Mb window) in 823 European ancestry children from the HELIX project by running two models: the main model adjusted for age, sex, and cohort (1); and an additional model further adjusted for blood cell type proportions (2). Then, we characterized cis-eQTMs identified in the main model by performing different enrichment analyses (3), analyzing their overlap with CpGs described in the literature (4), evaluating the proportion of cis-eQTM CpG-gene pairs captured through annotation to the closest gene (5), assessing the effect of genetic variants (6) and age by checking the overlap with eQTMs described in adults (7). Finally, we used cis-eQTMs identified in the additional model to investigate blood cell type specificity (8).

Study population and molecular data
The study includes 823 children of European ancestry from the HELIX project with available DNA methylation and gene expression data. These children, enrolled in 6 cohorts, were aged between 6 and 11 years old and were sex balanced (Table 1).
We also classified them as invariant (45.0%) or variant (55.0%) based on a methylation range threshold of 0.05 points, measured as the difference between the methylation values in percentile 1 and percentile 99 [23]. As expected, CpGs with medium methylation levels, which likely represent CpGs whose methylation status changes among blood cell types or which are influenced by genetic variants, showed higher variability in the population (p-value < 2e-16, Figure S1).  Figure S2).

Overview of blood autosomal cis-eQTMs in children
Identification and classification of blood autosomal cis-eQTMs We tested the association between DNA methylation and gene expression levels of the 13.6 million autosomal TC-CpG pairs through linear regressions. After correcting for multiple testing (see Material and Methods), we identified 63,831 significant child blood autosomal cis-eQTMs (0.47% of total TC-CpG pairs). For simplicity, we will refer to them as eQTMs in the subsequent text. These eQTMs comprised 35,228 unique CpGs and 11,071 unique TCs, of which 7,878 were annotated as coding genes. 38,310 eQTMs (60.0%) showed inverse associations, meaning that higher DNA methylation was associated with lower gene expression. Each TC was associated with a median of 2 CpGs (IQR = 1; 6), while each CpG was associated with a median of 1 TC (IQR = 1; 2) ( Figure S3). As expected, CpGs in eQTMs were enriched for CpGs variable in the population (odds ratio (OR) = 6.1, p-value < 2.2e-16) ( Figure S4) and for TCs with call rates >90% (OR = 5.4, p-value < 2.2e-16) ( Figure   S5).   CpGs in eQTMs are close to their target TC and have modest effects CpGs whose methylation level was associated with gene expression tended to be close to the TSS, being 74.3% of all eQTMs located at <250 kb ( Figure 2). Globally, the median distance between a CpG and a TC TSS in an eQTM was 1.3 kb (IQR = -84 kb; 117 kb). The observed downstream shift can be explained because for each TC we chose the most upstream TSS, which might not represent the TSS that gives the most abundant transcript in the blood ( Figure 2). This shift depended on the direction of the effect: median distance for positive and inverse eQTMs were 6.7 kb (IQR = -79 kb; 103 kb) and 0.8 kb (IQR = -93 kb; 136 kb), respectively. A similar shift was observed for eQTLs (expression quantitative trait loci, i.e. single nucleotide polymorphisms (SNPs) associated with gene expression) in the Genotype-Tissue Expression (GTEx) project [24].
We report the effect size of eQTMs as the log 2 fold change (FC) of gene expression per 0.1 points increase in methylation (or 10 percentile increase). In absolute terms, the median effect size was 0.12, being the minimum 0.002 and the maximum 16.4, and with 91.4% of the eQTMs with an effect size <0.5. A median effect size of 0.12 means that a change of 0.1 points in methylation levels was associated with around a 9% increase/decrease of gene 13 expression. We did not find any association between the effect size and the CpG-TC TSS distance or the relative position to the TSS (upstream or downstream) ( Figure S6).  p-value ZNF.Rpts = 9.48e-58) ( Figure 3B; Figure  CpGs in inverse eQTMs were depleted for bivalent TSSs (OR TssBiv = 0.85, p-value TssBiv = 4.0e-10). Overall, these results suggest that CpGs in eQTMs tend to be in active blood regulatory regions, with CpGs in inverse eQTMs specifically located in promoters (TssAFlnk).
Given that methylation levels characterize regulatory elements, we also evaluated whether methylation levels were related to the presence of eQTMs. We found that CpGs in eQTMs

Genes in eQTMs are involved in immune system functions
To identify which biological functions were regulated by eQTMs, we ran a gene-set enrichment analysis based on the genes annotated to TCs in these eQTMs. 6,675 out of the 11,071 annotated genes in eQTMs were present in gene ontology (GO), leading to 76 enriched GO terms (q-value < 0.001) ( Table S1). As expected, due to the tissue analyzed, 59.2% of the GO terms were related to immune responses (N = 45), followed by GO terms associated with cellular processes (N = 19), and metabolic processes (N = 12). Among immune GO terms, 20 of them were part of innate immunity, 18 of adaptive response and 7 were general/other immune pathways.

CpGs in eQTMs are enriched for CpGs associated with phenotypic traits and/or environmental exposures
We assessed whether CpGs in eQTMs were enriched for CpGs whose methylation levels had been related to phenotypic traits and/or environmental exposures. To this end, we retrieved CpGs from EWAS performed in blood of European ancestry subjects: 143,384 CpGs from the EWAS catalogue [5], and 54,599 CpGs from the EWAS Atlas [6]. Among them, 16,083 and 9,547 CpGs were part of our eQTMs, representing 45.7% and 27.1% of all eQTMs. We found that CpGs in eQTMs were enriched for CpGs in these EWAS databases in comparison to CpGs not in eQTMs (OR EWAS-catalogue = 1.48; p.value EWAS-catalogue < 2e-16; OR EWAS-Atlas = 2.53; p-value EWAS-Atlas < 2-16) ( Figure S10). Enrichment was more pronounced in CpGs in inverse eQTMs than in CpGs in positive eQTMs. Of note, CpGs present in the EWAS catalogue and the EWAS Atlas tended to exhibit medium methylation levels compared to CpGs not listed in the datasets ( Figure S11). Among them, CpGs in eQTMs did not have a different distribution of methylation levels compared to CpGs not in eQTMs.

Annotating CpGs to the closest gene only partially captures eQTMs
A standard approach to interpret EWAS findings is to assume that CpGs regulate the expression of proximal genes. These proximal genes are usually identified through the Illumina 450K annotation [26], which annotates a CpG to a gene when the CpG maps into the gene body, untranslated regions, or promoter region defined as <1,500 bp upstream the TSS. We evaluated to which extent the Illumina 450K annotation captures the eQTMs identified in our catalogue.
To do so, we subsetted 351,909 CpG-TC pairs involving 27,610 unique CpGs and 16,957 unique genes that were present in the Illumina 450K and the Affymetrix HTA 2.0, and thus which could be compared (Table S2). First, we analyzed whether these CpG-TC pairs were more likely to be eQTMs than CpG-TC pairs not annotated to the same gene.

Genetic contribution to child blood autosomal cis-eQTM regulation
We hypothesized that genetic variation might regulate DNA methylation and gene expression in a fraction of child blood autosomal cis-eQTMs. To test this, we used two measures of genetic influence: (1) blood methylation heritability for each CpG calculated from twin designs (total additive heritability) and genetic relationship matrices (SNP heritability) as reported by Van Dongen and colleagues [28], and (2) meQTLs (methylation quantitative trait loci, SNPs associated with DNA methylation levels) identified in the ARIES dataset [29].
First, we found that CpGs in eQTMs had higher total additive and SNP heritabilities than CpGs not in eQTMs (with a median difference of 0.23 and 0.10, respectively, and a p-value < 2e-16 for both) ( Figure 4A and 4B). Moreover, heritabilities were higher in CpGs with a higher number of associated TCs (for each associated TC, total additive and SNP heritabilities increased 0.025 and 0.026 points, respectively, with a p-value < 2e-16 for both).

19
These results suggest that CpGs that regulate the expression of several genes, master regulators, are more prone to be themselves regulated by genetic variation.

Figure 4. Genetic contribution to child blood autosomal cis-eQTMs.
CpGs were grouped by the number of TCs they were associated with, where 0 means that a CpG was not associated with any TC (non-eQTMs). A) Total additive heritability as inferred by Van Dongen and colleagues [28], by each group of CpGs associated with a given number of TCs. B) SNP heritability as inferred by Van Dongen and colleagues [28], by each group of CpGs associated with a given number of TCs. C) Proportion of CpGs having a meQTL (methylation quantitative trait locus), by each group of CpGs associated with a given number of TCs.
Second, we studied whether CpGs in eQTMs were enriched for meQTLs, either cis or trans.
We restricted our analysis to 2,820,145 meQTLs identified in blood samples of children aged due to the linkage disequilibrium. One example of such a SNP-CpG-TC trio is formed by rs11585123-cg15580684-TC01000080.hg.1 (AJAP1), in chromosome 10 ( Figure S13).
Next, we run a gene-set enrichment analysis with the 2,738 genes involved in these trios.
We identified 26 significant GO terms (q-value < 0.001), of which 12 were related to metabolic processes, 8 with immunity (6 innate, 1 adaptive immunity, and 1 general/other), and 6 with cellular processes (Table S3). In contrast to the gene-set analysis performed with genes of all eQTMs, genes of eQTMs under genetic control seem to be deflected to metabolic processes (46.2% vs. 15.8%) rather than to immunity pathways (30.8% vs. 57.9%) ( Table S4).

Effect of blood cellular composition on child autosomal cis-eQTMs
Blood is composed of different cell types that may exhibit different DNA methylation and gene expression patterns. To identify potential cell type-specific eQTMs, we repeated the analyses additionally adjusting for the proportions of the six main blood cell types estimated from the methylation data. We hypothesized that methylation, expression, and cell type 21 proportions will be correlated in cell type-specific eQTMs, but not in eQTMs shared among cell types, and therefore eQTMs significant in the main model but not in the adjusted model would be potential cell type-specific eQTMs ( Figure S14).
After adjusting for blood cellular composition, the number of eQTMs decreased from 63,831 to 39,749 (37.7% reduction) ( Table 3). Most of these 39,749 eQTMs were already detected in the main model unadjusted for cell type proportions, with only 17.9% being novel eQTMs.
Moreover, in the model adjusted for cellular composition, the number of unique CpGs in eQTMs was also reduced substantially (37.6%), while this reduction was less dramatic for TCs (19.7%). Thus, while CpGs were associated with a similar number of TCs in both models, TCs were associated with a lower number of CpGs after adjustment for cell type composition, indicating a loss in the transcriptional complexity ( Figure S15). Percentages are referred to the total number of eQTMs, TCs or CpGs for a given model.
We compared the effect estimates of eQTMs between the two models. For eQTMs significant in both models (model-shared eQTMs, N = 32,625), Pearson's correlation of the effect sizes was very high (r = 0.97, p-value < 2e-16) ( Figure 5A). Pearson's correlation for eQTMs significant only after adjusting for cellular composition (adjusted cell-specific eQTMs, N = 7,124) was lower (r = 0.8, p-value < 2e-16), but the estimates were still comparable, and they were marginally significant in the other model ( Figure 5B). In contrast, Pearson's correlation for eQTMs uniquely found in the main model (unadjusted cell-specific eQTMs, N = 31,206) was much lower (r = 0.54, p-value < 2e-16) with many eQTMs with effect sizes close to zero in the adjusted model ( Figure 5B).   Figure S18.

24
Finally, we checked our hypothesis that CpGs and TCs in eQTMs uniquely identified in the main model (unadjusted cell-specific eQTMs) are blood cell type-specific by contrasting them with data from sorted blood cell types. For this, we retrieved DNA methylation levels from six sorted blood cell types by Reinius and colleagues [30] and gene expression levels from twelve sorted blood cell types from the Blueprint project [31]. We used the log 10 of the F-statistic of a linear regression (see Material and Methods) as a measure of cell type specificity (higher F-statistic, higher specificity), as described elsewhere [32]. CpGs in unadjusted cell-specific eQTMs had higher F-statistics than CpGs in model-shared eQTMs (mean change in log 10 F-statistic = 0.42, p-value < 2e-16) ( Figure S19A). Similarly, genes in unadjusted cell-specific eQTMs (N = 2,278) had higher F-statistics than genes in modelshared eQTMs (N = 5,648) (mean change in log 10 F-statistic = 0.10, p-value = 4.57e-11) ( Figure S19B). This suggests that unadjusted cell-specific eQTMs, eQTMs uniquely found in the unadjusted model, likely represent blood cell type-specific CpG-TC associations.

Influence of age on child blood autosomal cis-eQTMs
To understand the association between methylation and gene expression throughout life, we evaluated whether child blood autosomal cis-eQTMs were enriched for CpGs with variable blood methylation levels during childhood/adolescence. To this end, we used two databases: 14,150 CpGs from the MeDALL project whose methylation levels varied between 0 and 8 years (9,647 CpGs with increased and 4,503 CpGs with decreased methylation) [ Figure 6A). CpGs positively associated with gene expression tended to encompass CpGs showing increased methylation over age in both databases, and vice-versa. This suggests that the change in DNA methylation levels in age variable CpGs, either increase or decrease, is related to activation of expression of genes, rather than to repression. For these CpGs, we did not observe different distributions of median methylation levels between CpGs in eQTMs and CpGs not in eQTMs ( Figure S20). Figure 6. Influence of age on child blood autosomal cis-eQTMs. A) Enrichment of CpGs in child blood autosomal cis-eQTMs for CpGs with variable methylation levels. CpGs in eQTMs were classified in all CpGs in eQTMs (grey); CpGs in inverse eQTMs (yellow); and CpGs in positive eQTMs (green). Age variable CpGs were retrieved from the MeDALL project (from birth to childhood [33]) and the Epidelta project (from birth to adolescence [34]), and they were classified in: variable (CpGs whose methylation change over time); decreased (CpGs whose methylation decreases over time); and increased (CpGs whose methylation increases over time). The y-axis represents the odds ratio (OR) of the enrichment. B) Overlap between autosomal cis-eQTMs identified in adults (GTP: whole blood; MESA: monocytes) [11] with eQTMs identified in children (HELIX). All CpG-gene pairs reported at p-value < 1e-5 in GTP or MESA that could be compared with pairs in HELIX are shown. C) Overlap between blood autosomal cis-eQTMs identified in children (HELIX) with eQTMs identified in adults (GTP: whole blood; MESA: monocytes) [11]. All CpG-gene pairs in HELIX that could be compared with pairs in GTP or MESA are shown. Note: The comparison has been split into two plots because one TC in HELIX can be mapped to different expression probes in GTP and MESA and vice-versa. Only comparable CpG-TC pairs are shown (see Material and Methods).
Subsequently, we evaluated whether child blood autosomal cis-eQTMs were consistent along the life-course. For this, we used data from the adult blood eQTM catalogue of Kennedy and colleagues [11], which, similar to ours and in contrast to the catalogue of Bonder and colleagues [8],  Figure S21). Also, we observed that CpGs in child-specific eQTMs had higher blood cell type specificity compared to age-shared eQTMs (p-value < 2e-16, Figure S22A), but no major differences were observed at the gene expression level (p-value = 0.039, Figure S22B). Both types of eQTMs were enriched for meQTLs, but enrichment was stronger for CpGs in age-shared eQTMs (OR meQTLs (age-shared) = 19.22, p-value meQTLs (age-shared) < 2e-16 and OR meQTLs (child-specific) = 4.99, p-value meQTLs (child-specific) < 2e-16). The enrichment for ROADMAP blood chromatin states was quite similar between the two groups ( Figure S23) can be found in Figure S24. Finally, both types of CpGs were enriched in CpGs variable from birth to childhood/adolescence.
Overall, we found that CpGs in eQTMs were enriched for CpGs whose methylation levels changed from birth to adolescence. Moreover, the overlap between child and adult eQTMs was small: only around 8% of HELIX eQTMs were also described in adults. Child and adult (age-shared) eQTMs tended to be proximal to the TSS, and thus enriched for promoter chromatin states, under the control of genetic variation and common to different blood cells.
In contrast, child-specific eQTMs were located at longer distances from the TSS, enriched for repressed regions, and with higher cell type specificity.

Discussion
In this work, we present the first blood autosomal cis-eQTM catalogue in children. We identified 63,831 eQTMs, representing 35,228 unique CpGs and 11,071 unique TCs. A substantial fraction of these eQTMs was influenced by genetic variation and cellular composition, and the overlap with eQTMs reported in adults was small, indicating that genetics, cellular composition, and age are main factors to be considered in EWAS studies.
The characteristics of the child blood autosomal cis-eQTMs were highly consistent with patterns previously described in other studies. Most of the eQTMs tended to be proximal to the TSS of the gene they were associated with [11,18], but the magnitude of the effect was independent of the distance between the CpG and the TC TSS. Although higher methylation is sometimes assumed to lead to lower expression, we found that around 40% of eQTMs were positively associated with gene expression, a percentage in line with previous results in whole blood (31% [9] and 30% [15]), monocytes (47%) [15], T-cells (31%), lymphoblastoid cell lines (43%) and fibroblasts (49%) from umbilical cord blood [14,15]. CpGs in inverse and positive eQTMs tended to be localized in blood enhancers and other active regulatory regions and not in CpG islands, a pattern also previously reported [9,15]. Despite these common locations, CpGs in inverse eQTMs were specifically found around active TSSs, including the distal promoter and the 5'UTR, while positive-CpGs were localized in gene 29 body regions. These results highlight the importance of the genomic context to infer the direction of the association of DNA methylation and gene expression. Further studies would be needed to investigate the combined effect of several CpGs located in different genomic regions on the expression of each gene. We note, however, that the causal relationship between DNA methylation and gene expression cannot be ruled out from our study. There is some evidence suggesting that DNA methylation could be a consequence of gene expression, as opposed to the often assumed regulation of gene expression by DNA methylation [14,35].
It has been previously reported that a substantial part of eQTMs is influenced by genetic variation. In HELIX, CpGs in eQTMs showed higher heritabilities, especially if they were linked to several TCs, and 32.4% of the eQTMs involved CpGs for which at least one meQTL was found. Given that genetic variation could be the underlying causal explanation of the association between DNA methylation and gene expression, we searched for SNPs simultaneously associated with DNA methylation and gene expression in our data. We DNA methylation and gene expression are cell type-specific [25,30,31,36]. To explore this in our bulk data, we hypothesized that causally related CpG-TC pairs would not be affected by adjustment for cellular composition if their association takes place across cell types, while if it was cell type-specific, it would be attenuated. In consequence, the comparison between unadjusted and adjusted models for cellular composition can provide information about potential cell type-specific eQTMs. Our results seem to support our hypothesis, as eQTMs exclusive to the main unadjusted model, and thus potential cell type-specific, were composed by CpGs and TC with higher cell type specificity (higher F-statistic) in blood cell methylation and expression sorted studies [30,31]. Of note, a high F-statistic can either mean high methylation/expression differences in one particular blood cell type or intermediate methylation/expression differences across several cell types. Besides that, complexity in the transcriptional regulation, assessed as the number of CpGs per TC, was decreased after eliminating the effect of cellular composition. Also, potential cell type-specific eQTMs were located at farther distances from the TSS compared to potential cell shared eQTMs, which were predominantly found in the promoter region. Our findings are consistent with previous literature that describes that promoters act as common regulatory regions across tissues, while enhancers, more distal to the TSS, further tune the expression levels to the needs of each tissue providing additional regulation complexity [37,38]. They are also in line with findings on the regulation of gene expression by genetics, where tissue-shared and tissue-specific eQTLs are enriched for promoters and enhancers, respectively [39].
Nonetheless, additional studies at the single cell level should be performed to further differentiate between shared and cell type-specific eQTMs.
In order to know how eQTMs behave along life-course, we compared blood autosomal cis-eQTMs identified in HELIX children with eQTMs reported by Kennedy and colleagues in whole blood and monocytes from adults [11]. We discarded performing the comparison with the blood eQTMs identified by Bonder and colleagues [8], as their approach has many methodological differences compared to ours, most notably the elimination of the effect of genetic variation on the association of DNA methylation and gene expression. We found that only 8.3% of the child blood eQTMs were also reported in adults. Similarly, a high proportion of adult blood eQTMs was neither present in children (41% in GTP and 64% in MESA). This small overlap between adult and child eQTMs has different explanations. Methodological issues such as gene expression platforms with low overlap, statistical methods, and statistical power might have limited the comparison between adults and children in a comprehensive manner. Also, factors other than age, such as cohort-specific environmental exposures or cellular composition might explain the low concordance. Unsurprisingly, HELIX and MESA presented the highest divergence, as HELIX used whole blood and MESA monocytes. Although both HELIX and GTP used whole blood, we cannot discard that differences in eQTMs are a consequence of cell type dynamics over life-course [40]. Despite the effect of these methodological and confounding factors, it is known that DNA methylation and gene expression changes with age [33,34,36], consequently, we expect partial overlap between adult and child eQTMs. The short list of age-shared eQTMs tended to encompass CpGs located in promoters, with lower cell type specificity and highly regulated by genetic variants. Beyond the differences between age groups at the eQTM level, the overall pattern in regulatory elements was similar between adults and children [9,15]. Finally, we also observed that regulatory CpGs in children (CpGs in eQTMs) usually involved CpGs whose methylation varied between birth and childhood/adolescence and that they tended to activate rather than inactivate transcription over this period. They were also enriched for CpGs found to be related to environmental factors and phenotypic traits.
Our catalogue of child blood autosomal cis-eQTMs is meant to improve the biological interpretation of EWAS conducted in children. It has several strengths compared to previous eQTM studies. First, we reported all CpG-TC pairs we tested in our analysis, as opposed to existing blood eQTM catalogues which only reported pairs passing a p-value threshold [8,11]. Reporting all pairs has several advantages: (1) we do not rely on an arbitrary p-value threshold; (2) pairs without association in our study are available for replication and metaanalysis studies, reducing publication bias; (3) researchers can consider pairs not significant in our dataset but with relevant fold change estimates. Second, we reported which eQTMs are influenced by genetic variation, and thus researchers can take this into account when exploring the relationship between methylation and expression in their data. In contrast, Kennedy and colleagues did not consider genetic variation as a potential explanatory factor for the eQTMs they described [11], while Bonder and colleagues only reported the association of CpGs with gene expression after removing the effect of genetic variation [8].
Third, the catalogue includes both unadjusted and adjusted models for cellular composition, which might help to identify cell type-specific eQTMs. Overall, and after demonstrating that only half of the CpG-gene relationships are captured through annotation to the closest gene, our eQTM catalogue becomes an essential and powerful tool to help researchers interpret their EWAS studies, with a particular focus on childhood.
The catalogue also has some limitations. First, it only covers a fraction of all CpG-TC pairs, as both the methylation and gene expression arrays have limited resolution. For instance, the methylation array only covers 1-2% of all CpGs in the genome; and the gene expression array although includes >60,000 TCs, coding and non-coding, which is not comparable to untargeted measurements through RNA-seq, at least for not low abundant transcripts.
Second, the catalogue does not include sex chromosomes which require more complex analyses to address X-inactivation and sex-specific effects. Third, due to statistical power limitations, only cis effects were tested. Fourth, effect sizes should be considered with caution as the association between DNA methylation and gene expression might be nonlinear [41] and effects might be affected by cellular composition. Finally, we have to acknowledge that the catalogue will be useful for biological interpretation of EWAS, if DNA methylation is not a mere mark of cell memory to past exposures without or with time-limited transcriptional consequences [42].
In summary, besides characterizing child blood autosomal cis-eQTMs and how they are affected by genetics, age, and cellular composition, we provide a unique public resource: a catalogue with the association of 13.6 M CpG-gene pairs, with and without adjustment for cellular composition, and of 1.3 M SNP-CpG-gene trios (https://helixomics.isglobal.org/). This information will improve the biological interpretation of EWAS findings.

Methods
Sample of the study  [49]. All participants in the study signed an ethical consent and the study was approved by the ethical committees of each study area [43].
In the present study, we selected a total of 832 children of European ancestry that had both DNA methylation and gene expression data. Ancestry was determined with cohort-specific self-reported questionnaires.
Biological samples DNA was obtained from buffy coat collected in EDTA tubes at mean age 8.1 years old. were repeated due to their overall low quality.
DNA methylation data was pre-processed using minfi R package [50]. We increased the stringency of the detection p-value threshold to <1e-16, and probes not reaching a 98% call rate were excluded [51]. Two samples were filtered due to overall quality: one had a call rate <98% and the other did not pass quality control parameters of the MethylAid R package [52].
Then, data was normalized with the functional normalization method with Noob background subtraction and dye-bias correction [53]. Then, we checked sex consistency using the shinyMethyl R package [54], genetic consistency of technical duplicates, panel samples, and other samples making use of the genotype probes included in the Infinium HumanMethylation450K BeadChip and the genome-wide genotyping data when available. In total four samples were excluded, two with discordant sex and two with discordant genotypes. Batch effect (slide) was corrected using the ComBat R package [55]. Duplicated samples, one of the samples from the panel study and HapMap samples were removed as well as control probes, probes in sexual chromosomes, probes designed to detect Single Nucleotide Polymorphisms (SNPs) and probes to measure methylation levels at non-CpG sites, giving a final number of 386,518 probes.
CpG annotation was conducted with the IlluminaHumanMethylation450kanno.ilmn-12.hg19 R package [26]. Briefly, this package annotates CpGs to promoter (up to 1500 bp from TSS), After normalization, several quality control checks were performed and four samples with discordant sex and two with low call rates were excluded [56]. One of the samples from the panel study was also eliminated for this analysis. Control probes and probes in sexual chromosomes or probes without chromosome information were excluded. Probes with a DABG (Detected Above Background) p-value <0.05 were considered to have an expression level different from the background, and they were defined as detected. Probes with a call rate <1% were excluded from the analysis. The final dataset consisted of 58,254 TCs.
Gene expression values were log 2 transformed and batch effect controlled by residualizing the effect of surrogate variables calculated with the sva method [57] while protecting for main variables in the study (cohort, age, sex, and blood cellular composition).
Blood cellular composition 37 Main blood cell type proportions (CD4+ and CD8+ T-cells, natural killer cells, monocytes, eosinophils, neutrophils, and B-cells) were estimated using the Houseman algorithm [58] and the Reinius reference panel [30] from raw methylation data.
Genome-wide genotyping Quality control was performed with the PLINK program following standard recommendations [59,60]. We applied the following sample quality controls: sample call rate <97% (N filtered=43), sex concordance (N=8), heterozygosity (N=0), relatedness (N=10, including potential DNA contamination), duplicates (N=19). Then we used the peddy python script to predict ancestry from GWAS data [61]. To do so, 6,642 genetic variants, highly polymorphic among populations, and data from the 1000G project were used [62]. We contrasted ancestry predicted from GWAS with ancestry recorded in the questionnaires. Twelve samples were excluded due to discordances between the two variables.

Identification of child blood autosomal cis-eQTMs
To test associations between DNA methylation levels and gene expression levels in cis (cis-eQTMs), we paired each TC to all CpGs closer than 500 Kb from its TSS, either upstream or downstream. In the main analysis, we fitted for each CpG-TC pair a linear regression model between gene expression and methylation levels adjusted for age, sex, and cohort. A second model was run additionally adjusted for blood cellular composition, estimated from DNA methylation data, as described above.
To ensure that CpGs paired to a higher number of TCs do not have higher chances of being part of an eQTM, multiple-testing was controlled at the CpG level, following a procedure previously applied by Bonder and colleagues [8]. To this end, we generated 100 permuted gene expression datasets and ran our previous linear regression models obtaining 100 permuted p-values for each CpG-TC pair. Then, for each CpG, we selected among all CpG-TC pairs the minimum p-value in each permutation and fitted a beta distribution. Next, for each CpG, we took the minimum p-value observed in the real data and used the beta distribution to compute the probability of observing a smaller p-value. This probability was the adjusted p-value at the CpG level. Finally, we considered as significant those CpGs with empirical p-values significant at 5% false discovery rate (FDR), based on Benjamini-Hochberg. Finally, in order to define significant CpG-TC pairs, we selected the CpG with the maximum p-value which was considered as significant and used this adjusted p-value as the significance threshold. Then, we went back to the beta distributions at the CpG level and selected any CpG-TC pair whose p-value was smaller than the significance threshold. We applied the same process for the model adjusted for cellular composition.
Characterization of the child blood autosomal cis-eQTM catalogue We used different statistical methods to characterize CpGs and TCs of the eQTM catalogue.
A linear regression was run to compare the methylation range vs. methylation levels categories (low, medium, high). Enrichment of CpGs/TCs for regulatory elements were tested using Chi-square tests with CpGs/TCs not in eQTMs as reference, unless otherwise stated. Results with a p-value < 0.05 were considered as significant. Annotation of CpGs to regulatory elements is described in the section "DNA methylation assessment".
We explored the enrichment of CpGs in eQTMs for phenotypic traits and/or environmental exposures using the EWAS catalogue [5] and the EWAS Atlas [6]. We used version 03-07-2019 of the EWAS catalogue and selected those studies conducted in whole or peripheral blood of European ancestry individuals. We downloaded EWAS Atlas data on 27-11-2019 and selected those studies performed in whole blood or peripheral blood of European ancestry individuals or with unreported ancestry. For both catalogues, we considered all associations from selected studies.
We used results from the MeDALL and the Epidelta projects to test whether CpGs in eQTMs were enriched for CpGs variable from birth to childhood or adolescence, respectively. For MeDALL we downloaded data from supplementary material of the following manuscript [33].
For Epidelta, we downloaded the full catalogue (version 2020-07-17) from their website We also tested whether genes in eQTMs were enriched for specific GO terms using the topGO R package [67]. We analyzed GO terms in the biological processes' ontology using the weight01 algorithm, which considers GO terms hierarchy for p-values computation. GO terms with q-value < 0.001 were considered as significant.
Comparison with annotation to close gene We evaluated whether using annotation of CpGs to the closest gene (Illumina annotation) captured eQTM associations. CpGs were annotated to Gene Symbol using the IlluminaHumanMethylation450kanno.ilmn-12.hg19 R package [26], while TCs were annotated to Gene Symbol using the HTA-2.0 Transcript Cluster Annotations Release na36 annotation file from Affymetrix. Given that CpGs and TCs could be annotated to several genes, we considered that a CpG-TC pair was annotated to a comparable gene if at least one of the genes annotated to the CpG matched at least one of the genes annotated to the TC. In total, we identified 351,909 comparable CpG-TC pairs. Then, a Chi-square test was run to compute whether these 351,909 comparable CpG-TC pairs were enriched for CpG-TC pairs being eQTMs.
Next, we evaluated whether the relative position of the CpG in the genic region was related to the expression of the eQTM-linked gene. To do so, the comparable 351,909 CpG-TC pairs were expanded to 411,900 entries. Each entry represented a CpG-TC pair annotated to a unique different gene, thus, for instance a CpG-TC pair annotated to two different genes, was included as two entries. In this expanded CpG-TC pair set, Chi-square tests were run to test the enrichment of CpGs in eQTMs for relative gene positions.
Evaluation of the genetic contribution on child blood autosomal cis-eQTMs We used two approaches to evaluate the influence of genetic effects in child blood autosomal cis-eQTMs. First, we used heritability estimates of CpGs computed by Van Dongen and colleagues [28]. Median total additive and SNP-heritability was compared between CpGs in eQTMs and CpGs not in eQTMs, using a Wilcoxon test. For CpGs in eQTMs, linear regressions between heritability measures (total additive and SNP heritabilities) and the number of TCs associated with each CpG were run.
Second, we tested whether CpGs in eQTMs were more likely regulated by SNPs, thus they were enriched for meQTL. In order to define meQTLs in HELIX, we selected 9.9 M cis and trans SNP-CpG pairs with a p-value < 1e-7 in the ARIES dataset consisting of data from children of 7 years old [29]. In this subset of 9.9 M cis and trans-CpG pairs, we run meQTL analyses using MatrixEQTL R package [68], adjusting for cohort, sex, age, blood cellular composition (similar to ARIES) and the first 20 principal components (PCs) calculated from genome-wide genetic data of the GWAS variability. We considered as significant meQTLs the SNP-CpG pairs reaching a p-value < 1e-7 also in HELIX. Enrichment of CpGs in eQTMs for CpGs with meQTLs was computed using a Chi-square test.
Finally, we tested whether meQTLs were also eQTLs for the gene in the eQTM. To this end, we run eQTL analyses with MatrixEQTL adjusting for cohort, sex, age, blood cellular composition and the first 20 GWAS PCs in HELIX. We considered as significant eQTLs the SNP-TC pairs with p-value < 1e-7 and with the direction of the effect consistent with the direction of the meQTL and the eQTM. To test if eQTMs were cell type-specific, we used as a proxy of cell type specificity the Fstatistic of a linear regression between gene expression/CpG methylation levels in sorted blood cell types (outcome) vs. cell type (predictor), as described before [30]. Higher Fstatistics are assumed to be indicative of higher cell type specificity. DNA methylation and gene expression in sorted blood cell types were retrieved from [30] and the Blueprint project [31] (https://blueprint.haem.cam.ac.uk/bloodatlas/), respectively. Once gene/CpG cell type specificity was calculated and log 10 transformed (log 10 F-statistic), we tested its association with the eQTM category (model-shared or unadjusted cell-specific eQTMs) by fitting linear regressions.
Comparison with adult blood eQTM catalogues: GTP and

MESA
We compared our list of child blood autosomal cis-eQTMs obtained in HELIX with the eQTMs described in blood of two adult cohorts: GTP and MESA [11]. In order to compare eQTMs, the comparison was done at the gene symbol level. In HELIX, we selected the gene symbol of the most likely mRNA mapped to the transcript cluster (TC).
In GTP and MESA, we used the gene symbol annotation provided by the authors. As a result of this process, different TCs or expression probes were mapped to the same gene symbol. Thus, a CpG-TC pair in HELIX was mapped to multiple CpG-pairs in GTP and MESA and vice-versa. To handle this issue, we split our comparison in two pairs. First, we

Competing interests
The authors declare that they have no competing interests.