DNA methylation and general psychopathology in childhood: an epigenome-wide meta-analysis from the PACE consortium

The general psychopathology factor (GPF) has been proposed as a way to capture variance shared between psychiatric symptoms. Despite a growing body of evidence showing both genetic and environmental influences on GPF, the biological mechanisms underlying these influences remain unclear. In the current study, we conducted epigenome-wide meta-analyses to identify both probe- and region-level associations of DNA methylation (DNAm) with school-age general psychopathology in six cohorts from the Pregnancy And Childhood Epigenetics (PACE) Consortium. DNAm was examined both at birth (cord blood; prospective analysis) and during school-age (peripheral whole blood; cross-sectional analysis) in total samples of N = 2178 and N = 2190, respectively. At school-age, we identified one probe (cg11945228) located in the Bromodomain-containing protein 2 gene (BRD2) that negatively associated with GPF (p = 8.58 × 10–8). We also identified a significant differentially methylated region (DMR) at school-age (p = 1.63 × 10–8), implicating the SHC Adaptor Protein 4 (SHC4) gene and the EP300-interacting inhibitor of differentiation 1 (EID1) gene that have been previously implicated in multiple types of psychiatric disorders in adulthood, including obsessive compulsive disorder, schizophrenia, and major depressive disorder. In contrast, no prospective associations were identified with DNAm at birth. Taken together, results of this study revealed some evidence of an association between DNAm at school-age and GPF. Future research with larger samples is needed to further assess DNAm variation associated with GPF.


Introduction
Psychiatric disorders or symptoms co-occur more often than would be expected by chance alone [1,2]. In light of the negative clinical and functional outcomes associated with psychiatric co-occurrence [3,4], it is important to identify early indicators of risk and underlying biological mechanisms. There is accumulating evidence that, as early as in childhood, the shared variance between psychiatric disorders or symptoms may be usefully represented by a general psychopathology factor (GPF) [5][6][7][8]. This GPF in childhood has been found to show temporal stability [6] and to predict long-term functional and psychiatric outcomes in adolescence throughout adulthood [9,10]. Although previous research has found evidence for both genetic and environmental influences on GPF [7,[11][12][13][14][15][16], the biological mechanisms underlying these influences remain unclear.
One of the ways by which genetic and environmental factors might contribute to disease susceptibility is through epigenetic mechanisms that regulate gene expression, such as DNA methylation (DNAm) [17]. Studies have shown that variation in DNAm is influenced by a dynamic interplay of genetic and environmental factors [18]. In turn, alterations in DNAm patterns across the genome in peripheral tissues including cord blood, and peripheral blood have been found to associate with a wide range of child and adult mental health outcomes, such as conduct problems, attention deficit hyperactivity disorder (ADHD) symptoms, major depressive disorder (MDD), and schizophrenia [19][20][21][22]. Despite a growing body of research implicating an involvement of DNAm in individual mental health outcomes, much less work has focused on the relationship between DNAm and general psychopathology [23]. To the best of our knowledge, only one study examined the association between genome-wide DNAm patterns and GPF in childhood. In this study, data were analyzed cross-sectionally in one cohort, focusing on wider biological networks (so called 'modules') of co-methylated DNAm probes across the genome [23]. As such, we still lack knowledge on how GPF relates to single DNAm probes and the extent to which associations vary across time (i.e., both cross-sectional and prospective associations). Large multi-cohort epigenome-wide studies, which allow for increased power and generalizability, are needed to improve our understanding of the biological mechanisms underlying shared variance across mental health problems.
We conducted epigenome-wide meta-analyses to investigate both probe-level and regionbased associations of DNAm with school-age GPF in the Pregnancy And Childhood Epigenetics (PACE) Consortium. Because it is unclear at which time point differential DNAm may be most relevant to GPF, we examined DNAm both at birth (cord blood; prospective study; pre-symptom manifestation) and at school-age (peripheral whole blood; cross-sectional study) in pooled samples of N = 2178 and N = 2190 children, respectively.

Participants
The prospective analyses included four cohorts from PACE, using complete data on DNAm at birth, general psychopathology in childhood and covariates: the Avon Longitudinal Study of Parents and Children (ALSPAC), Drakenstein Child Health Study (DCHS), Generation R (GENR), and INfancia y Medio Ambiente (INMA). These cohorts have a combined sample size of 2178 (see Table 1). All prospective cohorts included participants of European ancestry, except for DCHS, which included participants of predominantly Black African ancestry or mixed ancestry. See Supplementary Methods for full cohort descriptions.
The cross-sectional analyses included four cohorts from the PACE consortium, using complete data on DNAm and general psychopathology in childhood, as well as covariates; ALSPAC, GENR, Glycyrrhizin in Licorice (GLAKU), and Human Early Life Exposome (HELIX; including six jointly analyzed sub cohorts). These cohorts have a combined sample size of 2190 (see Table 1). All cross-sectional cohorts included participants of European ancestry, except for HELIX, which included participants of European ancestry and participants with a Pakistani background living in the United Kingdom, which were analyzed as separate cohorts in our meta-analysis. was assessed with the Illumina Infinium  HumanMethylation450 (ALSPAC, DCHS, GENR, HELIX, INMA) or the Infinium HumanMethylationEPIC (DCHS, GLAKU) BeadChip assays in cord blood and in peripheral whole blood at ages 7-12 years. The cohorts performed sample processing, quality control (QC) and normalization based on their preferred protocols as described in the Supplementary methods. We used normalized, untransformed beta values, ranging from 0 (fully unmethylated) to 1 (fully methylated). Methylation levels that fell outside of the lower quartile minus 3×interquartile or upper quartile plus 3×interquartile range were removed.

DNA methylation-DNAm
We excluded probes with a call rate <90%, control probes, and probes that mapped to X/Y chromosomes. Following Zhou et al. [24], we further excluded probes with poor base pairing quality (lower than 40 on 0-60 scale), probes with non-unique 30 bp 3'-subsequence (with cross-hybridizing problems), Infinium II probes with SNPs of global MAF over 1% affecting the extension base, and probes with a SNP in the extension base that causes a color channel switch from the official annotation. We also excluded a subset of probes (n = 69) that have shown to be unreliable in a recent comparison of the Illumina 450 K and EPIC BeadChips [25]. At the meta-analysis level, we excluded probes which were available in <50% of the cohorts and <50% of the participants. After QC, 404,017 probes remained at birth and 413,497 probes remained at school-age.
General psychopathology factor-Mental health symptoms were assessed when children were aged 6-12 years, depending on the cohort. Parentrated instruments were used, including (i) the Child Behavior Checklist 6-18 (CBCL/6-18) in DCHS, GENR, GLAKU, HELIX, and INMA, and (ii) the Development and Well-being Assessment (DAWBA) in ALSPAC. Instruments are described in more detailed in the Supplementary methods. Whereas a single general factor loaded on all CBCL or DAWBA problem subscales, two specific factors loaded on internalizing (CBCL: anxious/ depressed, withdrawn/ depressed, somatic complaints; DAWBA: generalized anxiety disorder, major depressive disorder, social phobia, separation anxiety, specific phobia) versus externalizing (CBCL: rule-breaking behavior, aggressive behavior, attention problems; DAWBA: attention deficit hyperactivity disorder, oppositional defiant disorder, conduct disorder) subscales. For the CBCL, three subscales (social problems, thought problems, other problems) were indicators of the general factor but were not part of the specific internalizing or externalizing factors. Of note, GLAKU included only two of these three CBCL subscales as the 'other problems' subscale was not available.
The internalizing and externalizing factors were allowed to correlate with each other but not with the general factor. As such, the general factor represents the shared variance among all problem scales that is independent of the more specific internalizing and externalizing factors. Previous research reported negative correlations between the GPF and cognitive outcomes [10,14,26]. To support the criterion validity of the GPF, we estimated the correlation between the GPF and child cognition across the cohorts.
Covariates-We adjusted for the following potential confounders: child sex, gestational age at birth, child age at the assessment of outcome, maternal age, maternal educational level, prenatal maternal smoking status, cell-type proportions estimated using standard algorithms for DNAm at birth [27] or childhood [28], ancestry (depending on the specific cohort), and technical covariates (e.g., batch) (see Supplementary methods). To test the robustness of findings when using a different method to estimate cell-type proportions, we re-ran the cross-sectional EWAS analyses within the cohort with the largest sample size (HELIX, itself comprised of different participating cohorts) with newly estimated cell proportions using IDOL (Salas et al., 2018 [29]) instead of Houseman's approach [28].

Statistical analyses
General psychopathology factor-We used confirmatory factor analysis (CFA) to fit a general psychopathology model in the full samples with mental health data available (see Supplementary Information 1). Each cohort ran the CFA according to a predefined script, using the Lavaan statistical package [30] in R (https://www.r-project.org/). GPF scores were extracted, winsorized at +/-3SD, and standardized.
Cohort-specific EWAS-Each cohort ran the EWAS according to a predefined analysis plan, using robust linear regression (rlm; MASS R-package) to account for potential heteroscedasticity and non-normality. Cohorts excluded all multiple births and chose one random sibling per non-twin sibling pair.
Meta-analysis-The cohort-specific results were meta-analyzed at Erasmus MC Rotterdam. A shadow meta-analysis was conducted independently at the Barcelona Institute for Global Health. We performed an inverse-variance weighted fixed effects approach using R and METAL [31]. Probes were annotated using meffil [32]. Genome-wide significance was defined at the Bonferroni-corrected threshold of p < 1 × 10 −7 , and suggestive significance at p < 1 × 10 −5 . We included p < 1 × 10 −4 specifically for pathway and enrichment analyses to allow a sufficient number of genes to be included.
We ran two sensitivity meta-analyses. First, we included only cohorts of predominantly European ancestry to check if the results of the main analysis were influenced by ancestry. Second, we performed leave-one-out meta-analyses for hits showing genome-wide significant associations with GPF to ensure that associations were not driven by a single cohort.
Differentially methylated regions-Differentially methylated regions (DMRs) were identified using the dmrff package [33] in R. This method first identifies candidate DMRs by screening the meta-level EWAS results for genomic regions each covered by a sequence of CpG sites with EWAS effects in the same direction, EWAS p-values<0.05, and <500 bp gaps between consecutive CpG sites. Then, summary statistics are calculated for each candidate DMR within each of the cohorts by meta-analyzing the cohort-level EWAS summary statistics of the CpG sites in the region. Meta-analysis is performed by a variation of inverse weighted fixed effects meta-analysis that accounts for non-independence between CpG sites. Finally, for each candidate DMR, the summary statistics from each cohort are meta-analyzed to obtain a crosscohort meta-analyzed DMR statistic and p-value.
Follow-up analyses-Individual probes showing genome-wide or suggestive significance were looked up in the EWAS catalog [34] and EWAS atlas [35] to examine potential associations with exposures and health outcomes based on existing studies. To further characterize potential environmental -as well as genetic -influences on these sites, we used two different tools: 1) a heritability tool quantifying additive genetic influences as opposed to shared and non-shared environmental influences on DNAm, based on data from monozygotic and dizygotic twins; [36] and 2) the GoDMC database (http:// mqtldb.godmc.org.uk/) as a more specific tool for identifying genetic influences on DNAm levels via mQTL mapping. GoDMC is a large-scale collaborative effort including 36 cohorts (4 of which participated in this study: INMA, ALSPAC, GENR, GLAKU), based on whole blood from over 27,000 European samples. We characterized cross-tissue correspondence of DNAm using the Blood Brain DNA Methylation Comparison Tool by Hannon et al [37]., the Blood-Brain Epigenetic Concordance (BECon) [38], and the Iowa Methylation Array Graphing for Experimental Comparison of Peripheral tissue & Gray matter (IMAGE-CpG) [39]. To assess whether methylation levels of CpGs were associated with the expression levels of nearby genes in child blood, we consulted the HELIX Expression Quantitative Trait Methylation (eQTM) catalogue (https://helixomics.isglobal.org/), generated from samples overlapping with those included in this study (from the HELIX cohort). Finally, chromatin states associated to the most significant CpGs were assessed using ROADMAP blood 15 reference chromatin states (annotation and enrichment analysis conducted using the Enrichment module of the EASIER R package). Genome Browser (UCSC) was used to further explore the genomic context of the identified DMR.
To identify broader pathways and enrichment for molecular functions, we used the gene ontology (GO-biological processes, GO-molecular functions and GO-cellular components), the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway, and the Molecular Signature Database (MSigDB) enrichment methods from the missMethyl R package [40], as implemented in the Functional Enrichment module of the EASIER R package [41]. We ran GWAS enrichment analyses for EWAS using the GenomicRanges Package [42], to identify genomic regions of EWAS suggestive hits (p < 1 × 10 -4 ) that overlapped with the 378 genome-wide significant loci previously reported in GWASs on general psychopathology [16], schizophrenia [43], neuroticism [44], ADHD [45] or anxiety [46] (0.5 Mb window centered to the genomic locus indicated in the original studies).

General psychopathology factor
All mental health subscales had significant loadings on the general factor across all cohorts, with all loadings >0.30. For full details on the GPF loadings, correlations, and model fit, see Supplementary Table 1. The loadings of the mental health subscales on the specific internalizing and externalizing factors tended to be lower and were less consistent across the cohorts, as were the correlations between these specific factors. In INMA and HELIX, a model including the correlation between the specific internalizing and externalizing factors did not fit the data well (see Supplementary Information 1). Therefore, in both INMA and HELIX, the specific internalizing and externalizing factors were not allowed to correlate (i.e., completely orthogonal model; see Supplementary Table 1). In line with previous research [5,7,14], GPF consistently negatively correlated with child cognition (see Supplementary methods) across the cohorts (mean r = −0.12, range = −0.08 to −0.13).

Epigenome-wide meta-analysis
Descriptive statistics across the cohorts are shown in Supplementary Table 2. We note that some differences were observed in GPF levels and sociodemographic characteristics between the full cohort samples and analytical subsamples used in the present study (see Supplementary Table 2). These differences varied depending on the specific cohort and variable examined. We prospectively examined associations of DNAm at birth (n = 2178) at 404,017 CpG sites with GPF at school-age. There was no evidence of genomic inflation in the cohort-specific EWASs (range λ = 0.95-1.14), nor in the meta-analysis (λ = 1.08, see also Fig. 1). As can be seen in Fig. 2, no CpG reached genome-wide significance at p < 1 × 10 −7 , with four CpGs showing p < 1 × 10 −5 (see Table 2). For the top hit (cg02084087), annotated to TNFRSF25 (TNF Receptor Superfamily Member 25), a 10-point increase in percentage methylation was related to a 0.43 SD increase in general psychopathology symptoms (p = 5.54 × 10 −6 ).
In the cross-sectional meta-analysis of DNAm at school-age (n = 2,190) at 413,497 sites, one CpG reached genome-wide significance (see Fig. 2). For this CpG probe (cg11945228), mapped to BRD2 (Bromodomain-containing protein 2 gene), a 10-point increase in percentage methylation was related to a 3.70 SD decrease in general psychopathology symptoms (p = 8.58 × 10 −8 ). Of note, there was a negative association between DNA methylation at this CpG and GPF in all cohorts except for the HELIX-Pakistani cohort. It is not possible based on the present data however to establish whether this may reflect an ancestry-specific association pattern or the influence of other cohort-specific factors ( Supplementary Fig. 2). Twenty other CpGs showed p < 1 × 10 −5 (see Table 3). These 21 top hits identified at school-age did not overlap with the ones observed at birth. Furthermore, as shown in Supplementary Table 3, the significant hit identified at school-age did not reach nominal significance (p < 0.05) at birth (B = 5.28, SE = 3.76, p = 0.16). Nominally significant probes identified in childhood correlated at r = 0.004, p = 0.55 (n = 23,764) with respective probes at birth. Sensitivity analyses-Restricting the meta-analysis to children with European ancestry did not change the overall pattern of results for both prospective (n=2027) and crosssectional (n=2125) studies, as evidenced by cross-meta-analysis correlations of effect estimates (r prospective = 0.99, r cross-sectional = 0.99) and consistent directions (95% and 96%, respectively) of effect estimates. The top hit identified at schoolage remained genome-wide significant (B=−38.02, SE=6.95, p = 4.47 × 10 −8 ).
Leave-one-out meta-analyses showed that the significant top hit identified during childhood (cg11945228) was robust to excluding all individual cohorts, except GENR (for a leave-one- Rijlaarsdam [28]. We found that the correlation between the regression beta coefficients for all CpGs was very high (r = 0.97) ( Supplementary Fig. 3), indicating that results are highly concordant when using these two different methods.

Differentially methylated regions
In the prospective analyses, there was no evidence of DMRs at birth associated with GPF.
In the cross-sectional analyses, one DMR at childhood was associated with GPF (estimate = 10166. 54

Follow-up analyses
All probes showing significant or suggestive associations with DNAm had twin heritability estimates available, showing mean additive genetic influences of r birth = 0.16 and r childhood = 9.44 × 10 −2 (see Supplementary Table 5).
Of the four suggestive probes identified at birth, three were associated with at least one known methylation quantitative trait locus (mQTL) (see Supplementary Table 5a) and one (cg09437808) showed a concordant DNAm pattern (r > 0.28, p < 0.01) between blood and several brain regions (the prefrontal cortex, entorhinal cortex, and the superior temporal gyrus) according to Hannon et al. tool (see Supplementary Table 6a). This positive correlation between blood and brain is also reported by IMAGE CPG tool (r = 0.35, p = 0.31) (Supplementary Table 7a) but not identified in BECon.
The genome-wide significant probe identified during childhood (cg11945228) was unrelated to known mQTLs and showed non-significant correlations between blood and brain DNAm (data only available in one of the three online tools used to assess this concordance) ( Supplementary Tables 5b and 6b). Of the 20 suggestive probes identified in childhood, ten were associated with at least one known mQTL (see Supplementary Table 5b) and four (cg22691524, cg09040034, cg25182716, cg18436008) showed a significant correlation between blood and at least one brain region DNAm (r > 0.25, p < 0.04; see Supplementary  Table 7b). None of the suggestive probes identified at birth or childhood showed links to an eQTM. According to EWAS Atlas and EWAS Catalogue, methylation levels at these top CpGs seem to be variable and sensitive to age, sex, tissue, or substance exposure (smoking, alcohol, polychlorinated biphenyls), and/or associated to several traits such as inflammatory and neurological diseases (rheumatoid arthritis, Behcet's disease, myalgic encephalomyelitis, multiple sclerosis, among others) (see Supplementary  Table 9).
The six probes in the DMR within the childhood analyses showed low evidence of genetic effects, as indicated by both twin-based estimates (mean variance explained by additive genetic influences r = 0.007) and the lack of associations with known mQTLs. One probe, cg05867423, was positively correlated between blood and brain according to data from Hannon et al. tool (r = 0.36, p < 0.002), with positive correlations also identified in the BECon and IMAGE CPG tools (see Supplementary Tables 6c, 7c, 8c). In addition, cg08455700 showed high correlations (r = 0.71) between blood and Brodmann area 20 according to the BECon tool (Supplementary Table 7c). None of these probes was related to eQTMs in blood.
Regarding chromatin states, we found that the genome-wide significant probe (cg11945228) and the DMR found at childhood were associated with active states (active transcription start site (TSS)-proximal promoter state and a transcribed state at the 5' and 3' end of genes showing both promoter and enhancer signatures) (Supplementary Table 10). In fact, an enrichment analysis for chromatin states revealed an overrepresentation of active states associated with zinc finger protein genes (ZNF/Rpts) within the most significant CpGs (p < 1 × 10-4) detected in the prospective meta-analysis (Supplementary Table 11a). In contrast, no consistent enrichment for active states vs repressed states was found based on the most significant CpGs detected in the cross-sectional meta-analysis (p < 1 × 10-4). However, we observed a significant underrepresentation of active transcription start site (TSS)proximal promoter states (TssAFlnk), and an overrepresentation of actively-transcribed states (Tx, TxWk) together with inactive quiescent states (Quies) (Supplementary  Fig. 4a). Hence, according to Chromatin interaction data (in situ Hi-C Chromatin Structure from a lymphoblastoid cell line), the genomic elements comprised in the region involving the two genes associated with the DMR seem to strongly interact with each other (Supplementary Fig. 4b).
GO, KEGG, and MSigDB analyses revealed no significantly enriched common biological processes, cellular components, molecular functions or pathways for the genes mapped to the probes at p < 1 × 10 −4 in the meta-analyses at birth (n = 56) and during childhood (n = 104) (see Supplementary Table 12a-f).
Results of the GWAS enrichment analyses for EWAS are presented in Supplementary  Table 13. Of the 56 probes at p < 1 × 10 −4 in the prospective EWAS meta-analysis, six overlapped with genomic loci previously linked to general psychopathology [16], schizophrenia [43], neuroticism [44], ADHD [45] or anxiety [46] based on GWASs. Of the 104 probes at p < 1 × 10 −4 in our cross-sectional EWAS meta-analysis, 13 (12.5%) overlapped with genomic loci previously linked to these psychiatric outcomes. Most notably, this cross-sectional enrichment analysis prioritized cg08514304 (TAOK2), which was among the top 10 suggestive hits identified in our cross-sectional EWAS meta-analysis and showed a consistent direction of effect in all cohorts. Finally, regarding the DMR, SNPs within the region comprising the associated genes SHC4 and EID1 have been related with psychiatric disorders such as major depressive disorder [47,48], bipolar disorder [49], mood and psychotic disorders [50], obsessive compulsive disorder [51], and schizophrenia [52] (Supplementary Table 14) (Supplementary Fig. 4b). Interestingly, both SHC4 and EID1 genes are highly expressed in the brain according to GTEx data ( Supplementary Fig. 4).

Discussion
We conducted the largest epigenome-wide meta-analysis of GPF in childhood, using DNAm assessments at two different time points (birth and childhood). The analyses revealed little evidence for probe-specific associations between DNAm in cord blood or peripheral blood and GPF. However, we did identify a significant DMR in childhood, implicating two relevant genes.
On the basis of probe-level genome-wide meta-analyses, we found that lower DNA methylation at cg11945228 at school-age was significantly associated with higher levels of GPF. Cg11945228 is located within the BRD2 gene, a BET (bromodomains and extra terminal domain) family chromatin adaptor that controls the transcription of a wide range of pro-inflammatory genes [53] and is involved in neural tube closure [54], neurogenesis [55], and neuroinflammation [56]. DNAm of the BRD2 promotor has been implicated in juvenile myoclonic epilepsy, a common adolescentonset genetic generalized epilepsy syndrome [57]. However, we advise caution when interpreting this specific site because, despite having low variation attributable to heterogeneity across the cohorts, its genome-wide significant association with GPF seems to be driven by one single cohort.
With regards to genes with probes at suggestive significance at school-age (WDR20, MOV10, and TAOK2), these have previously been linked to neurodevelopmental and psychiatric risk, such as autism spectrum disorder (ASD) and schizophrenia [58][59][60][61][62][63][64][65]. Pleiotropy was supported by our cross-sectional GWAS enrichment analyses for EWAS, showing that TAOK2 overlapped with genomic loci previously linked to schizophrenia [16,43], as well as obsessive compulsive disorder and bipolar disorder [16]. However, despite these previously established links with mental health outcomes, annotated genes of our overall top hits identified in the EWAS meta-analyses were not enriched for common biological processes, cellular components, molecular functions, or pathways.
The significant DMR identified at school-age included 6 CpGs mapped to SHC4 and EID1 genes, which are highly expressed in the brain. SHC4 regulates BDNF-induced MAPK activation [66] and EID1 plays an important role in the central nervous system [67], being involved in cell proliferation in the brain, synaptic plasticity and memory function. Interestingly, genetic variants in these genes have been previously implicated in multiple psychiatric disorders according to several studies (mostly GWAS), including bipolar disorder [49], obsessive-compulsive disorder [51], mood and psychotic disorders [50], schizophrenia [52], or MDD [47,48,68] (Supplementary Table 14) (Supplementary Fig. 4b). The fact that the DMR overlaps with active regulatory elements of these genes and shows evidence of blood-brain concordance for some of the CpGs supports the potential functional relevance of this region. Mechanistic studies will be needed in future to elucidate biological processes underlying the observed link between DNAm in this region and increased risk for multiple psychiatric outcomes. Of interest, despite similar sample sizes and measures (i.e., almost exclusively the CBCL), the top signals were very different between the prospective and cross-sectional EWASs, as evidenced for example by the lack of a correlation between nominally significant sites for these analyses. This low overlap might be due to the temporally dynamic nature of the methylome. DNAm patterns vary substantially over time [69] and can show temporally specific associations with outcomes, including psychiatric symptoms [20]. Unlike an existing EWAS meta-analysis on ADHD symptoms, which showed the strongest signal prospectively at birth compared to childhood [20], we did not detect any significant prospective associations. This is particularly interesting given the use of largely overlapping samples, suggesting that cord blood DNAm may capture risk for specific psychiatric problems (in this case ADHD) rather than a broader liability to psychopathology.
Strengths of this study include the large sample size and the use of DNAm at two different time points (birth and childhood), enabling the assessment of both prospective and crosssectional associations with GPF. Another important strength is the use of standardized protocols and scripts to fit GPF to the data in a multicohort setting. The GPF scores we analyzed were previously found to associate with a module of co-methylated DNAm probes across the genome [23], suggesting that it is possible to detect biological correlates of GPF using this study's measure. Furthermore, the current study showed that GPF consistently negatively correlated with child cognition across the cohorts as expected based on existing evidence [7], suggesting that it is capturing a similar, valid construct across the cohorts.
However, the current findings should also be interpreted in the context of several limitations. First, given the possibility of residual confounding and reverse causality, the direction of the observed associations cannot be inferred. DNAm might be a marker for unmeasured environmental factors that could affect GPF via independent pathways. Furthermore, children with higher levels of mental health problems may evoke a particular environment [70], which might affect DNAm. Second, our top hits were unrelated to eQTMs. Future research integrating transcriptomic data will be important for assessing the functional relevance of DNAm changes to gene expression in the brain. Third, because DNAm is tissue specific, our observations in peripheral blood may not reflect DNAm levels in other, potentially more relevant, tissues such as the brain. Despite potential sex differences in mental health problems [71] the current study did not examine sexspecificity for power reasons. Further, participating cohorts used different normalization pipelines, which may have contributed to cohort differences and influenced our results. In future, it would be optimal for meta-analytic studies to utilize a standardized processing pipeline across all samples. Furthermore, we found heterogeneity in CFA parameters, particularly for the specific internalizing and externalizing factors (especially in the GLAKU cohort). This precluded us from investigating whether, aside from the GPF, DNA methylation patterns also associate with variance that is unique to these symptom domains -an interesting question for future research. In future, it would be optimal for meta-analytic studies to utilize a standardized processing pipeline across all samples. Finally, the present findings are based on a predominantly European population and the cohorts are sampled from settings which largely have socialized healthcare, access to mental health services, and different cultural stigma surrounding mental health than other population groups. Future genome-wide studies with larger sample sizes are needed to replicate our findings in other ancestries and in more diverse settings to further characterize DNAm sites associated with GPF.
In summary, this large EWAS meta-analysis identified one probe (Cg11945228) for which lower DNAm in childhood was associated with higher levels of GPF. Furthermore, one DMR in childhood was associated with GPF. This DMR included 6 CpGs mapped to the SHC4 gene that has previously been implicated in multiple types of psychiatric disorders in adulthood. In contrast, no prospective associations were identified with DNAm patterns at birth. The current findings call for a more integrative approach to the study of GPF, using multiple omics sources, including the genome, epigenome, and transcriptome, to achieve a more comprehensive understanding of its biological underpinnings.

Supplementary Material
Refer to Web version on PubMed Central for supplementary material.

Data Availability
Site-level meta-analytical results will be made publicly available (Supplementary data file) upon acceptance for publication. For access to cohort-level data, requests can be sent directly to individual studies.

Code Availability
Analytical codes can be requested from authors. The diagonal line represents the distribution of the expected p-values under the null. Points above the diagonal refer to p-values that are lower than expected. The red line indicates genome-wide significance (p < 1 × 10 −7 ).