Prenatal exposure to common infections and newborn DNA methylation: A prospective, population-based study

Background : Infections during pregnancy have been robustly associated with adverse mental and physical health outcomes in offspring, yet the underlying molecular pathways remain largely unknown. Here, we examined whether exposure to common infections in utero associates with DNA methylation (DNAm) patterns at birth and whether this in turn relates to offspring health outcomes in the general population. Methods : Using data from 2,367 children from the Dutch population-based Generation R Study, we first performed an epigenome-wide association study to identify differentially methylated sites and regions at birth associated with prenatal infection exposure. We also examined the influence of infection timing by using self-reported cumulative infection scores for each trimester. Second, we sought to develop an aggregate methylation profile score (MPS) based on cord blood DNAm as an epigenetic proxy of prenatal infection exposure and tested whether this MPS prospectively associates with offspring health outcomes, including psychiatric symptoms, BMI, and asthma at ages 13-16 years. Third, we investigated whether prenatal infection exposure associates with offspring epigenetic age acceleration – a marker of biological aging. Across all analysis steps, we tested whether our findings replicate in 864 participants from an independent population-based cohort (ALSPAC, UK). Results : We observed no differentially methylated sites or regions in cord blood in relation to prenatal infection exposure, after multiple testing correction. 33 DNAm sites showed suggestive associations ( p<5e10−5; of which one was also nominally associated in ALSPAC), indicating potential links to genes associated with immune, neurodevelopmental, and cardiovascular pathways. While the MPS of prenatal infections associated with maternal reports of infections in the internal hold out sample in the Generation R Study (R 2incremental =0.049), it did not replicate in ALSPAC (R 2incremental =0.001), and it did not prospectively associate with offspring health outcomes in either cohort. Moreover, we observed no association between prenatal exposure to infections and epigenetic age acceleration across cohorts and clocks. Conclusion : In contrast to prior studies, which reported DNAm differences in offspring exposed to severe infections in utero , we do not find evidence for associations between self-reported clinically evident common


INTRODUCTION
Exposure to prenatal infections is associated with a range of adverse developmental and health outcomes in offspring (1,2).For example, evidence from clinical and population-based studies points towards a link between prenatal infection exposure and birth complications (e.g., premature birth (3)), physical health problems (e.g., metabolic syndrome (3), and asthma (4)), as well as neurodevelopmental disorders (e.g., ADHD and schizophrenia (5)(6)(7)(8)) in offspring.While some of these effects may be transient, longitudinal studies suggest that prenatal infection exposure can have lasting effects on offspring health (9)(10)(11), for example our prior work showed stable associations with elevated emotional and behavioral symptoms from toddlerhood into adolescence (12).Broadly, it is known that infections can influence offspring in utero through both direct and indirect routes (9).Certain infectious pathogens (e.g., cytomegalovirus or herpes) can directly pass the placenta and blood-brain barrier, disrupting processes such as fetal neuronal migration and synapse formation (9).Infections may also indirectly influence fetal development by activating the mother's immune system, in turn leading to placental insufficiency, due to elevated levels of placental immune markers, and/or fetal inflammation (9,13).Yet, through what specific molecular mechanism these acute adverse conditions can exert a lasting impact on the child remains unknown (9,11,13,14).
Epigenetic processes in the offspring, particularly DNA methylation (DNAm) at birth, represent a potential pathway of interest (9,(15)(16)(17)(18).DNAm can regulate gene activity through the addition of methyl groups to DNA base pairs in response to internal (e.g., genetic) and external (e.g., environmental) factors.DNAm has been suggested to play a key role in typical (neuro)development and aging, while alterations in DNAm have been associated with numerous mental health and physical disorders (19).Consequently, DNAm at birth may act as a biological marker (and potential mediator) of environmental influences -such as prenatal exposure to infections -on health outcomes.So far, epigenetic research on prenatal infections has primarily been conducted using experimental approaches.Preclinical animal studies have reported, for example, that DNAm alterations in placental tissue of exposed mice affect expression of key genes involved in fetal development and metabolism (e.g., IGF-2) (15,20).DNAm changes have also been observed in immune cells (e.g., IFN- and IL-4, linked to early childhood diseases such as asthma (20)), as well as brain cells in exposed offspring (15).Additionally, studies using humanderived cellular and organoid models have shown that exposure to the Zika virus leads to DNAm changes in neural cell progenitors, astrocytes, and differentiated neurons, particularly at genes implicated in neuropsychiatric disorders, such as schizophrenia (21,22).In humans, most epigenetic studies have focused on broader markers of inflammation, rather than (prenatal) exposure to infections (23)(24)(25)(26)(27).For example, several population-level, multi-cohort epigenome-wide association studies (EWAS) of C-reactive protein (CRP) levels, one of the most commonly examined inflammatory markers, have revealed associations with DNAm at a large number of sites in whole blood (24,26,27).These sites can be aggregated into a single methylation profile score (MPS) and used as an epigenetic proxy of CRP levels, which has been associated with (neuro)developmental and health outcomes in both pediatric and adult samples (24,(26)(27)(28)(29).In contrast, studies on prenatal infections have so far focused on severe infections (e.g., Hepatitis B, Zika virus, HIV, or SARS-CoV-2) in small, selected samples.These have shown, for example, that prenatal exposure to maternal hepatitis B infection is associated with global as well as locusspecific differences in DNAm in offspring cord blood (n=12) (30).Another study comparing neonates with congenital Zika virus related microcephaly (n=18) to controls (n=20) observed differential DNAm patterns in blood between the groups, coinciding with genes involved in fetal (neuro)development and viral host immunity (31).With regards to perinatally acquired HIV, two studies investigated DNAm levels in exposed children (32,33), reporting a large number of differentially methylated sites at mean age 7 (primarily in genes involved in adaptive immunity, including the MHC region) (n=120) (32), as well as epigenetic age acceleration (i.e., having an older epigenetic-estimated age relative to chronological age -a risk marker for poor health and age-related disease) from 11 to 17 years (n=32) (33).Lastly, two studies examined the association between prenatal exposure to SARS-CoV-2 infection and DNAm levels in cord blood (n=19) (34) and infant buccal levels (n=4) (35), with suggestive differentially methylated sites implicating once again adaptive immunity (MHC region), as well as genes associated with numerous diseases.Despite this growing evidence base linking prenatal infection exposure, inflammation, and epigenetic programming in offspring (15), key gaps remain.First, because of the current focus on severe infections (e.g., HIV or Zika virus) in small, selected, high-risk samples, little is known about how exposure to common infections may relate to offspring DNAm levels within the general population.Second, because infection exposure is typically measured only once, it is unclear to what extent associations with offspring DNAm may vary by timing of infection.Research leveraging data from i) multiple cohorts that feature comparable measures and ii) infection exposure at multiple time points would help to assess the robustness and generalizability of observed associations.Third, no study to our knowledge has tested whether differential DNAm related to infection exposure in utero also associate with relevant offspring health outcomes, such as psychiatric, respiratory, or cardiometabolic phenotypes.
To address these gaps, we investigated associations between prenatal cumulative exposure to self-reported (common) infections and offspring cord blood DNAm patterns at birth, based on data from 2,367 children from the population-based Generation R Study.We first performed an EWAS analysis to identify differentially methylated sites and regions in relation to prenatal infection exposure.As a sensitivity analysis, we also examined the role of timing of infection by repeating our analyses using trimester-specific scores of infections.Second, we developed a MPS of prenatal infections based on cord blood DNAm, and tested whether this score prospectively associates with offspring health outcomes, namely child psychiatric symptoms, body mass index (BMI), and asthma, measured in early adolescence.Third, we investigated whether prenatal exposure to infections associates with epigenetic age acceleration -as a marker of biological ageing -using two gestational epigenetic clocks.Across all steps, we sought to validate any findings in 864 children from an independent population-based cohort, the Avon Longitudinal Study of Parents and Children (ALSPAC).

Study selection and participants
We used data from two independent population-based birth cohorts: the Generation R Study (Generation R) (36,37) as the discovery sample and the Avon Longitudinal Study of Parents and Children (ALSPAC) (38)(39)(40)(41) as the replication sample.A brief description of both cohorts can be found in Supplemental text A. Both studies were approved by the local Medical Ethics Committee and written informed consent was obtained for all participants.
To be included in our study, epigenome wide DNAm data from cord blood had to be available.Analyses were restricted to participants of European ancestry (determined by self-reported questionnaire data), as epigenetic data was only available in this group within Generation R. We excluded twins.For siblings, only one member was included in analyses based on data completeness -or, if equal, based on random selection.This resulted in a total analytical sample of 2,338 children for Generation R and 864 children for ALSPAC.

Assessment of prenatal infection exposure
Generation R: We used a previously constructed cumulative prenatal score of common clinically evident infections (7).This score was derived from self-reported questionnaire data collected at three different stages of pregnancy, with one survey administered in each trimester.The questionnaire asked women to provide information on various infection-related items, including i) upper respiratory tract infections, ii) lower respiratory tract infections, iii) gastrointestinal infections, iv) cystitis/pyelitis, v) dermatitis, vi) eye infections, vii) herpes zoster, viii) influenza, ix) sexually transmitted diseases, and x) instances of fever (>38°C/100.4°F).Based on the responses, we created four scores: one for each trimester (trimester-based) and one covering the entire pregnancy.Each reported instance of an infection condition within a trimester ('yes') was assigned one point, while the absence of an infection condition ('no') received zero points.Consequently, the total score for each trimester (maximum = 10 points), as well as the cumulative score for the entire pregnancy (maximum = 30 points), was then calculated.
The rationale for utilizing the cumulative score is two-fold: i) we hypothesize that the activation of the mother's immune system, rather than specific effects of individual infections, may be an important pathway through which common clinically evident infections exert their effect on offspring (9,11) and ii) to increase statistical power of our exposure assessment, considering the variability in prevalence across different types of infections.Of note, DNAm assessments from blood were not conducted concurrently with the self-reported infection measurements.Additionally, no biological indicators of infection were available at the time of the self-reports.

ALSPAC:
We constructed a corresponding cumulative score of prenatal infection.However, only information on urinary tract infections and influenza could be included (i.e., 2 out of the 8 types of infections available in Generation R).We used self-reported questionnaire data collected at 18 weeks of gestation (reporting on infections up to 4 months of pregnancy), 32 weeks of gestation (reporting on infections in month 4-7 of pregnancy) and 8 weeks postpartum (reporting on infections in month 8-9 of pregnancy) (42).We combined the information on the fourth month of pregnancy from the questionnaire at 18 weeks of gestation with the questionnaire at 32 weeks of gestation to define infections in the second trimester.As the Generation R and ALSPAC infection scores had a different scale, we standardized both scores to a mean of 0 and a SD of 1 to enable comparison.Similarly, to Generation R, we created both trimester-specific and total prenatal infection scores based on this data.

Assessment of cord blood DNAm profiles and gestational epigenetic clocks
In both cohorts, umbilical cord blood was drawn at birth.In Generation R, DNAm profiles were generated either with the Illumina Infinium HumanMethylation450 BeadChip array (Illumina Inc., San Diego, CA) (GenerationR 450k : n=1367) or with the Illumina MethylationEPIC 850K array (Illumina Inc., San Diego, CA) (GenerationR EPIC : n=971).In ALSPAC, all samples were profiled using the 450k array (n=864).Supplementary text B described the sample processing, quality control and normalization steps in detail.In both cohorts, we employed normalized, untransformed beta values that ranged from 0 (i.e.fully unmethylated) to 1 (i.e.fully methylated).Before analysis, we removed extreme outliers beyond 3 times the interquartile range from the quartile limit (i.e., 25 th percentile minus 3*IQR and 75 th percentile plus 3*IQR).We further excluded probes that i) mapped to X and Y chromosomes, ii) overlapped with single-nucleotide-polymorphisms, and iii) were control or cross-reactive probes (targeting repetitive sequences/co-hybridizing to alternate sequences) (43,44).
We used two different gestational epigenetic clocks to compute estimates of epigenetic age acceleration (EAA) at birth, in order to test the robustness of associations and maximize comparability with existing studies on gestational epigenetic clocks (45).The Bohlin clock uses DNAm levels from 96 specific CpG sites sourced from the 450K array (46).The 450K/EPIC overlap clock (47) is constructed from the 173 CpG sites that are shared between the 450K array and the EPIC array.The selection of the sites for both clocks was guided by Lasso regression in the original studies.We used the methylclock package in R to compute EAA and to impute missing values, provided that less than 20% of the CpG's was missing based on the Bohlin and The 450K/EPIC overlap clocks (GenerationR 450k : 0 missing for both clocks, GenerationR EPIC : 10 CpGs missing for Bohlin clock [=10.4%] and 1 CpG missing for 450K/EPIC overlap clock [=0.6%],ALSPAC 450k : 0 missing for both clocks).For both clocks, we calculated residual EAA (in weeks), which captures the residuals from a linear regression of DNAm age on chronological gestational age (45,48).Unlike raw EAA, which represents a simple difference score between chronological and epigenetic gestational age, residual EAA provides a measure of epigenetic gestational age acceleration that is independent of chronological age (48).Residual EAA values can either be positive (indicating a higher epigenetic gestational age compared to the clinical gestational age) or negative (indicating a lower epigenetic gestational age compared to the clinical gestational age).

Child phenotypes
As part of our MPS validation, we included information on three child health outcomes (psychiatric symptoms, BMI, asthma) that have been linked to both (i) prenatal infections and (ii) DNAm levels in cord blood in previous population-based research.
Generation R: we used the parent-reported Child Behavioral Checklist (CBCL) to measure psychiatric symptoms when the child was 13-16 years old (49)(50)(51), resulting in dimensional scores for total behavioral problems, internalizing problems, and externalizing problems.When the child was 13-16 years old, we also measured height (m) and weight (kg) at the research center, after which BMI of the child was calculated.Subsequently, BMI-SDS was computed, adjusting for age and sex according to Dutch reference growth curves (52).Moreover, parent-reported questionnaire data was collected during this research wave to ask whether the child was previously diagnosed with asthma (yes/no).
ALSPAC: we used the Strengths and Difficulties Questionnaire (SDQ) to obtain information on general behavioral symptom scales, specifically for total difficulties, emotional problems, hyperactivity/inattention, and conduct problems, when the child was mean 15 years old (53).Height (m) and weight (kg) were measured when the child visited the research center at mean age 17 years, after which the BMI was calculated.A parent-reported questionnaire data was administered when the child was 13 years old to inquire about any prior asthma diagnosis.

Covariates
We adjusted for the following potential confounders: maternal age at delivery, maternal education, maternal smoking during pregnancy, parity, gestational age at birth, child sex, batch effects and methylation-derived estimated white cell type proportions.Details per cohort are provided below.
Generation R: maternal age at enrollment was calculated based on the difference between the mother's date of birth and the date of enrollment in the study.Parity and maternal education were prospectively assessed with a questionnaire at enrollment.Maternal tobacco use ('no', 'yes, until pregnancy was known,' and 'yes, continued during pregnancy') was assessed using questionnaires in all three trimesters.Based on Statistics Netherlands classifications, three categories were created for maternal education: 'primary' (no education or primary school), 'intermediate' (secondary school or lower vocational training) and 'high' (higher vocational training or university).Gestational age at birth (weeks) was calculated based on the date of the mother's last menstrual period or from ultrasound.Child sex was obtained from medical records.Sample plate was used to adjust for batch effects (technical covariate).To adjust for white blood cell type proportions (B-cells, CD4-T cells, CD8-T cells, granulocytes, monocytes, natural-killer cells, nucleated red blood cells), we used a cord-blood specific method (54).
ALSPAC: maternal age at enrollment was calculated based on the difference between the mother's date of birth and the date of enrollment in the study.Information on maternal tobacco use during pregnancy ('never', 'stopped smoking', 'continued smoking during pregnancy') and parity was obtained through self-reported questionnaires.Maternal education was measured in line with the ISCED system ('no education' = 0, 'compulsory schooling [CSE] = 1, 'O-level' = 2, 'University' = 3).Gestational age at birth was calculated based on the date of the mother's last menstrual period or from ultrasound.The gestational age at birth (in weeks) was determined either using the mother's last menstrual period or ultrasound measurements.Child sex was obtained from hospital registries.To adjust for batch effects, we used 20 surrogate variables (sva function from meffil package) (55).Blood cell type proportions (B-cells, CD4-T cells, CD8-T cells, granulocytes, monocytes, natural-killer cells, nucleated red blood cells) were estimated with a cord-blood specific method (54).

EWAS of prenatal infection exposure
For our first aim, we conducted a probe-level EWAS to identify CpG sites at birth (outcome) associated with prenatal infection exposure using linear regression models.In our discovery sample, Generation R, analyses were restricted to overlapping CpG sites (n=393,360) between the 450K array and the EPIC array and run separately per array (i.e., GenerationR 450k and GenerationR EPIC subsamples analyzed separately).Summary statistics for these two subsamples were then pooled via a standard inverse variance weighted fixed-effects meta-analysis (metafor R package) (56).To account for multiple testing, we defined genome-wide significance based on a 450K array p-value threshold of p<2.4e10−7 (57), suggestive significance at p<5e10−5, and nominal significance at p<0.05.Top-ranking probes identified in Generation R (defined as those meeting a suggestive significance level threshold of p<5e10−5) were then tested in ALPSAC as an independent replication sample (Figure 1).
As a supplementary analysis, we ran a second EWAS within the GenerationR EPIC subsample to examine associations between prenatal infection and EPIC-only CpG sites (n=414,823).Given the lower sample size and lack of a replication sample (i.e., as both GenerationR 450k and ALSPAC are based on the 450k array), as well as the dearth of previous epigenetic studies using this array in relation to infection-relevant exposures and the ubiquity of this newer array, these analyses were exploratory and performed to generate hypotheses for future research.

Characterization of EWAS results
Probes were annotated using BiocManager and IlluminaHumanMethylation450kanno.ilmn12.hg19packages in R, based on the hg19 genome build.Top-ranking probes (p<5e10−5) were examined using a range of openly accessible resources, to characterize (i) potential genetic influences, (ii) associations with gene expression, (iii) enriched biological pathways, and (iv) reported links to exposures and outcomes based on published research.First, we examined whether top-ranking sites are known to associate with common genetic variants in blood (i.e.methylation quantitative trait loci analysis [mQTL]) using the GoDMC database (http://mqtldb.godmc.org.uk).We also qualitatively used the GWAS Catalog (https://www.ebi.ac.uk/gwas/) to establish whether genes annotated to top-ranking probes were (i) enriched within existing genome-wide association studies (GWAS) of infection-related traits, and (ii) have been identified as top hits for specific exposures/traits in published GWAS.Second, we used the HELIX Web catalogue (https://helixomics.isglobal.org])to examine whether top-ranking sites are associated with gene expression levels in blood (i.e.expression quantitative trait methylation [eQTM]).Third, we examined whether genes annotated to top-ranking probes were enriched for molecular pathways and functions via gene ontology analyses (gometh and kegg; missMethyl R package) (58).Additionally, we performed an enrichment analysis by exploring the C7 immunological signature gene sets database (https://www.gsea-msigdb.org/gsea/msigdb/).We considered a p FDR value below 0.05 as significance threshold to identify independent pathways in the enrichment analyses.Finally, top-ranking probes were queried using available EWAS databases (EWASCatalog [https://www.ewascatalog.org]and EWAS Atlas [https://ngdc.cncb.ac.cn/ewas/atlas]) to identify reported associations with exposures and outcomes in published EWAS studies.

Regional-level epigenome-wide analysis
In addition to the probe-level EWAS, we performed differentially methylated region (DMR) analyses, to account for the correlated structure of DNAm and to identify broader genomic regions that are differentially methylated in relation to prenatal infections (59,60).Of note, regional analyses attenuate the burden of multiple testing and can also detect weaker signals that may be spread over wider regions.We selected potential DMRs by identifying genomic regions tagged by nominally significant CpG sites that are at most 500 bp apart.Correlations between sites are taken into account to avoid inflating regional statistics.We performed the regional analyses in Generation R for the 450K and EPIC arrays separately with the dmrff R package and meta analyzed summary statistics across arrays with the dmrff.metafunction in the package (61).Following the standard package settings, DMRs were considered significant using the p bonferroni <0.05 as significance threshold.If DMRs were identified in Generation R (p bonferroni <0.05), they were then tested for replication in ALSPAC.

Methylation profile score development
To address our second aim, we proceeded to develop an MPS of prenatal infections (62) (Figure 1).The first step involved feature selection (63).We selected prenatal infection-related CpG sites identified in our EWAS discovery analyses (i.e., in Generation R) at two different thresholds (nominal: p<0.05; suggestive: p<5e-5), to establish which threshold resulted in a better-performing MPS.Second, we split our discovery Generation R cohort into a training dataset (80%, n=1,871) and testing dataset (20%, n=467) with the caret R package (64), ensuring even distribution of 450K/EPIC array type in both datasets.
We used elastic net regularization, a machine-learning approach, to specify the prediction model and select optimal alpha and lambda features (65) (for more information on the method, please see the extended method section in Supplementary text C).
In the Generation R training set, we used 10-fold cross-validation to determine the optimal combination of the hyperparameters: mixing LASSO and ridge regularization (alpha) and shrinkage/strength of regularization (lambda) using the smallest mean square error [MSE]) in the glmnetUtils R package (66).Based on the optimal hyperparameters, we extracted CpGs exhibiting non-zero coefficients within the elastic net model using optimal alpha and lambda values.Then, in the Generation R test set, we created the MPS for prenatal infection (MPS Prenatal_Infection ), by multiplying the methylation beta values of each selected CpG by their estimated weight (the coefficients that are the output of the elastic net regularization from the Generation R train set) and summing up these weighted methylation values into a single MPS score, which was then standardized.
Third, to validate the MPS Prenatal_Infection in the Generation R test set (to prevent overfitting), we performed the following steps: i) we ran step-wise linear regression models to test whether the MPS Prenatal_Infection associates as expected with the prenatal infection score (based on maternal selfreports), and whether it adds explanatory power over the use of covariates alone (incremental R 2 ), and ii) we ran a regression model to examine whether the MPS Prenatal_Infection at birth prospectively associates with later offspring health outcomes at age 13-16 years (i.e., child psychiatric symptoms [linear], BMI [linear] and asthma [logistic]).
Finally, we performed external validation of the MPS Prenatal_Infection in ALSPAC.We used the weights calculated in the Generation R training set and multiplied those with the methylation betas of the corresponding CpG in ALSPAC, summing these into a single beta weighted sum score for prenatal infections.Similarly to the internal validation performed on the Generation R test set, we then ran regression models to establish whether the MPS Prenatal_Infection associates with (i) measured prenatal infections (also over and above covariates based on incremental R 2 ) and (ii) offspring health outcomes at age 15-17 years.

Gestational epigenetic clocks
For our third aim, we applied linear regressions with the prenatal infection sum score as the exposure and residual EAA estimates as the outcome, separately for each gestational epigenetic clock (Figure 1).In Generation R, we performed this analysis in the 450K array and EPIC array samples separately, and then meta-analyzed the results using the metafor package in R. The associations between prenatal infection and both gestational epigenetic clocks were also tested in ALSPAC.

Sensitivity analysis
Three sensitivity analyses were performed.First, all models (aim 1-3) were repeated using trimester-based scores of prenatal infections in both cohorts (as opposed to the total prenatal infection score), to explore the potential influence of timing of infection during pregnancy on offspring DNAm at birth.Second, because the prenatal infection score in ALSPAC contains only two domains (compared to ten in the discovery Generation R infection sum score) -limiting comparability between cohorts and potentially leading to discrepant findings -we created an abbreviated infection score in Generation R (containing only 'flu' and 'urinary tract infections' as done in ALSPAC).Top-ranking probes from our discovery EWAS analysis (aim 1) were then reanalyzed, in order to gauge the influence of using a more comprehensive vs abbreviated exposure score on infection-DNAm associations.Third, considering that cell type proportions may be influenced by infections, potentially mediating the effects of prenatal infection on DNA methylation, we conducted a sensitivity analysis of the main prenatal infection EWAS without adjusting for cell type proportions.

Missing values and covariates
All analyses (i.e., aims 1-3) were adjusted for relevant covariates using a core Model 1 and an extended Model 2. Model 1 included the covariates: child sex, maternal age at delivery, maternal education, maternal tobacco use, parity, batch effects (sample plate), and cell type proportions.Model 2 was additionally adjusted for gestational age at birth -a variable that could qualify as potential mediator but is also strongly associated with the outcome.We further applied single imputation to impute missing exposure and covariate variables, using the mice R package for 30 datasets with 50 iterations (maximum missingness of exposure, covariates and child outcomes was 24% and 16%, respectively in Generation R and ALSPAC).Hereafter, we selected a single imputed dataset which we then used for all further analyses.All statistical analyses were performed with R Statistical Software (version 4.3.1 R Development Core Team); the script with the code used for this project is publicly available on: https://github.com/ajsuleri/Prenatal_infections_EWAS.Moreover, the summary statistics of all EWAS results are made available on figshare (https://figshare.com/projects/Prenatal_exposure_to_common_infections_and_newborn_DNA_methylation_A_prospective_population-based_study/211576).A power analysis to calculate the minimum detectable effect size per cohort can be found in the extended method section (Supplementary Text C).

RESULTS
Sample characteristics for Generation R (GenerationR 450k, n=1,367; GenerationR EPIC, n=971) and ALSPAC (n=864) can be found in Table 1.The frequency distribution of the infection sum scores in both cohorts is shown in Figure 2.

Is prenatal infection exposure associated with site-and region-level DNAm in offspring at birth?
Based on our site-level EWAS, pooling data from 2,338 mother-child dyads, we did not identify any CpG site significantly associated with prenatal exposure to infection after genome-wide correction for multiple testing (p<2.4e10−7).The EWAS Manhattan plot can be found in Figure 3A, and the QQ-plot in Figure 3B, which showed no evidence of genomic inflation (=1.023).The 33 topranking CpG sites associated with prenatal infection exposure at a suggestive significance threshold (p<5e10-5) are listed in Table 2. Of note, these and following results were derived for Model 1, given that the results of models 1 and 2 were nearly the same (i.e., betas and standard errors were similar for at least the second or third decimals).In our sensitivity analyses, we found no association between the trimester-specific infection scores and DNAm levels at birth (Table S1, Figures S1-S6).We further found no association between prenatal exposure to infections and DNAm at the 414,823 CpG sites in cord blood that are only on the EPIC array (=0.977), after genome-wide correction.In the sensitivity analysis where we conducted the EWAS for total infections without adjusting for cell type proportions, consistent results were observed (see Table S2 and Figure S7).

Characterization of EWAS results
The 33 suggestive top-ranking CpG sites (i.e., p<5e10-5) identified in our main EWAS analysis (total infection sum score) were carried forward for biological and functional characterization.Of these, 13 (39.4%)have been linked to mQTLs, suggesting that they are at least partly under genetic control, whereas only 3 (9.1%)were linked to gene expression (i.e., eQTMs), indicating limited associations with expression levels of the top hits in blood (Table S3).Supplementary text D and Table S3 describes the trimester-specific results.Look-up of genes that associate with the suggestive top-ranking CpG sites (i.e., p<5e10-5) (Figures S8-11) showed that these genes were identified (suggestive or GWAS-specific threshold) in prior GWASes of infections such as SARS-CoV-2 and hepatitis.Pathway analyses did not identify significantly enriched biological processes, cellular components, or molecular functions associated with genes annotated to the top-ranking CpGs after FDR correction (q<0.05).The top GO terms and KEGG pathways are included in Tables S4 and S5.The enrichment analysis for immunological gene sets showed that the gene set 'GSE14769_UNSTIM_VS_360MIN_LPS_BMDM_DN' showed significant enrichment (p FDR =0.041) for the results of the total prenatal infection score, including genes such as AMBRA1, TAPBP, VOPP1, CTU1, and PJA2 (Table S6).This gene set represents differentially expressed genes comparing unstimulated and 360 minutes post lipopolysaccharide-stimulated bone marrowderived macrophages, focusing on downregulated genes.Look-up of suggestive top-ranking CpG sites in both the EWAS Catalog and EWAS Atlas indicated that most have been previously associated with inflammation-relevant traits and lifestyle factors, such as (auto-)immune conditions, asthma, cardiovascular conditions, smoking, and obesity at a suggestive significance threshold (Table S7).

Regional-level epigenome-wide analysis
We observed no significant differentially methylated regions for the total infection sum score nor for the trimester-based infection sum scores after Bonferroni correction (q<0.05).

Replication of top-ranking probes in ALSPAC
None of the 33 top ranking EWAS probes identified in Generation R replicated in ALSPAC (q>0.05)(Tables S8-S11).Before multiple testing correction, one CpG (cg01304814) showed nominal significant associations with prenatal infection exposure in ALSPAC, in the same direction across cohorts (=-0.087,SE=0.034, p=0.010).Moreover, sensitivity analyses showed that three CpGs (cg03987884, cg09130190, cg11170479) associated at a nominal significance level for trimester 2, and one CpG (cg07036524) for trimester 3 in ALSPAC, although these associations did not survive multiple testing correction.
The sensitivity analysis in Generation R using a more comparable infection score to ALSPAC (i.e., including only two out of ten infection domains) indicated that the use of this abbreviated score decreases the ability to identify infection-CpG associations (i.e., results in smaller effect sizes and larger standard errors) (=0.66 between full and abbreviated score).Only 6 of 33 of top ranking CpGs sites identified in our main analyses (using the full infection score), showed associations with the restricted infection score in Generation R using the suggestive significance for top ranking threshold (p<5e10-5) and 16/33 of the top ranking CpG sites were significant when using the nominal significance threshold (p<0.05)(Table S12), suggesting that the difference in infection scores may partly contribute to the lack of replicated CpGs in ALSPAC.Of note, the CpG site (cg01304814) that replicated in ALSPAC when using a nominal significance threshold, also remained significant in the Generation R cohort when using the abbreviated infection score.

Can we develop a methylation profile score predicting exposure to prenatal infection in cord blood, and does this score relate to offspring health outcomes?
We selected features based on a suggestive significance threshold, as this model outperformed selection based on nominal significance.The infection sum score was equally distributed between the training and testing set in Generation R (Figure S12).Tables S13-S16 show the weights derived from the optimal alpha and lambda combination to further construct the MPS.In the Generation R test set, we found that the MPSes of infections of total pregnancy and at each trimester were significantly (p0.001)associated with the respective infection sum scores (R 2 for total infections = 0.049 [4.9%]); however, the score did not associate with the infection sum scores in ALSPAC (R 2 for total infections = 0.001 [1.0%]) (Table 3, Figures S13-17).Moreover, the MPS did not associate with any of the child health outcomes in both cohorts (Tables S17-S20).

Is prenatal exposure to infections associated with epigenetic age acceleration at birth?
After multiple testing correction, we observed no association between prenatal exposure to infections and epigenetic gestational age acceleration estimated based on either the Bohlin or 450K/EPIC clocks in Generation R (Table S21) and ALSPAC (Table S22).

DISCUSSION
In this prospective population-based study, we examined whether prenatal exposure to selfreported clinically evident common infections (based on a cumulative score) associates with differential neonatal DNAm in cord blood, and whether this in turn relate to later offspring health outcomes.Discovery analyses were based on data from 2,367 children from the Generation R Study with trimester-specific maternal reports of infection exposure, and findings were tested for replication in 864 children from the independent ALSPAC cohort.Results from the epigenomewide analyses did not identify differentially methylated sites or regions at birth associated with infections during pregnancy, either measured as a total or trimester-specific cumulative score.Consistent with this, an aggregate MPS capturing broader infection associated DNAm patterns did not exhibit strong explanatory power in predicting reported infection exposure or relevant offspring health outcomes, including psychiatric symptoms, BMI, and asthma in adolescence in both cohorts.Finally, we observed no association between prenatal infections and epigenetic aging, based on two gestational epigenetic clocks, in both cohorts.Overall, our findings suggest that cumulative exposure to common infections during pregnancy is unlikely to be a strong influence on offspring DNA methylation patterns in cord blood in the general population.
Prior research using preclinical models and high-risk samples has reported numerous associations between prenatal exposure to infections and offspring DNAm, highlighting its potential role as a biological mediator on downstream health outcomes (13,17,19,(32)(33)(34)(35).In contrast, we did not identify any associations between prenatal infection exposure and offspring DNAm after multiple testing correction.Further, we did not observe any overlap between our findings (even at a more relaxed threshold of significance) and CpG sites or annotated genes identified in previous studies.Several factors may explain these apparent discrepancies.One relates to the type of infections investigated: while we focused on prenatal exposure to common infections, prior studies have investigated more severe exposures (i.e., continuous high-dose exposure to viral/bacterial pathogens in preclinical studies and severe infections such as HIV or Zika virus in human clinical studies).Whereas both HIV and Zika virus can directly pass the placenta and blood-brain barrier in the offspring (67,68), the common infections we studied cannot directly transfer to the placenta but instead can lead to placental inflammation; thus, indirectly affecting child development (9).Furthermore, DNAm associated with Zika or HIV virus exposure may reflect the unique molecular mechanisms employed by these pathogens during an infection.Both Zika and HIV have the capacity to interact with host cellular machinery, potentially inducing changes in DNAm as part of the host response to infection.For instance, Zika virus has been shown to directly affect neural progenitor cells, leading to epigenetic modifications in the developing fetal brain (22,31), whereas HIV, being a retrovirus, integrates its genetic material into the host genome, and this integration process can affect DNAm (19,32,33).Common infections may thus lack the distinctive pathogenic features that induce significant alterations in the offspring epigenome during fetal development.In our study, the use of a cumulative score for infections may have obscured any pathogen-specific associations with offspring DNAm.Additionally, by opting for a cumulative score rather than examining individual infections, our EWAS focused on exploring the shared pathway through which these infections collectively impact child neurodevelopment, rather than highlighting the unique effects of each infection individually.Moreover, because our infection score relied on self-reported data rather than directly measuring observed infection load, this approach may have contributed to the null findings in our EWAS, as self-reported data are susceptible to recall bias and may not capture all instances of infection accurately.This limitation could have obscured potential associations between infections and offspring outcomes, because the variability and severity of infections might not have been fully captured or quantified in our study.Future research employing more objective measures of infection, such as laboratory-confirmed diagnoses or biomarkers, could provide more precise insights into the relationship between infections and their effects on outcomes like DNA methylation.Moreover, further research is needed to clarify the effect of severity, chronicity, and infection type (among commonly-occurring infections) on the fetal epigenome -characteristics that we were underpowered to examine in our study.
Other methodological reasons that may explain discrepant findings include differences in (i) sample type, with our population study presenting a considerable increase in participants compared to prior work in smaller, selected samples, which may have decreased susceptibility to false positive associations; (ii) adjustment for confounders, such as maternal tobacco use or socioeconomic status, which have not typically been taken into account and may have influenced previously reported findings; and (iii) stringent multiple testing correction, which has unevenly been applied in the extant literature.Further, the use of a replication cohort in our study enabled us to evaluate the robustness and generalizability of our findings.Despite this, larger-scale multi-cohort investigations with available measures of biological immune markers (to confirm infection exposure) will be necessary to identify potentially more nuanced or infection type-specific associations with offspring DNAm.Such multi-cohort efforts, however, can be challenging due to differences in how, when and which type of infections are recorded between cohorts, complicating harmonization efforts.This is well exemplified in our study, where differences in the cumulative infection score between Generation R (10 infection domains assessed) and ALSPAC (2 domains assessed) were deemed too large for conducting a meta-analysis, and likely contributed to the poor replication of suggestive associations identified in Generation R (as supported by sensitivity analyses comparing associations using the full versus abbreviated infection score in Generation R).It is also possible that, rather than relating to DNAm in cord blood, prenatal exposure to infections associates with DNAm in different tissues (e.g., the brain, which is not accessible in vivo) or with different epigenetic mechanisms (e.g., microRNAs, histone modifications (13,19)), which are also potentially important -but currently under-researched -mediators of (prenatal) environmental effects on offspring health.Moreover, understanding the impact of prenatal infections during different trimesters of fetal development is crucial for elucidating their potential influence on neonatal health.In our study, we did not observe significant DNA methylation changes in neonates exposed to prenatal infections across any trimester.Similar to prenatal tobacco smoking (69,70), it is possible that sustained exposure to infections throughout pregnancy, rather than during a specific period such as early pregnancy, may result in more adverse outcomes.In other words, since we do not identify any specific pattern, the associations may be driven by cumulative exposure across all trimesters rather than by any single trimester.While our analyses using the total score of prenatal infections also did not yield significant findings, it is not a direct measure of chronicity or repeated exposure across all trimesters.Future research should aim to integrate multi-trimester assessments of diverse prenatal exposures with comprehensive DNA methylation profiling to better comprehend their combined effects on neonatal health outcomes and to pinpoint potential sensitive periods.
While overall we find weak evidence for an association between infections during pregnancy and DNAm in cord blood based on our EWAS analyses, we note that DNAm at 33 CpG sites showed suggestive associations, several of which are mapped to genes involved in immune and (neuro)developmental processes.One of these, MYT1L (cpg site: cg25376660), is expressed in neuronal tissue and has been linked to multiple neurodevelopmental and psychiatric outcomes (intellectual disabilities, autism spectrum disorder, schizophrenia, and attention deficit hyperactivity disorder (71,72)) as well as spontaneous preterm birth, IL-8 levels, type 2 diabetes, and systolic blood pressure (73).Another gene linked to our suggestive sites, STARD3 (cpg site: cg00264346), has previously been associated with cardiovascular disease, metabolic markers (e.g., HDL cholesterol and apolipoprotein A1), asthma, and inflammatory markers (e.g., leukocyte or neutrophil count) (73).The CpG site that showed some evidence of replication in the ALSPAC cohort (i.e., nominal association in the same direction as Generation R) is annotated to the PRKAR2A gene (cpg site: cg01304814), which is a signaling molecule that activates cAMP and is involved in mediating anti-inflammatory cytokine production.This gene has been associated with cortical thickness, sulcal depth, depressive symptoms, intelligence, educational attainment, bone mineral density, BMI and cholesterol in prior GWASes (73).Although associations did not hold after multiple testing correction on an EWAS-level, these may still point to interesting targets for future research.
Motivated by prior findings that MPSs for inflammatory markers such as CRP may capture more sustained inflammation compared to serum levels of these markers, and show utility as predictors of disease risk (24,26,28,29,74,75), we sought to develop an MPS of prenatal infections based on cord blood DNAm at birth.Although the score showed expected positive associations with measured infections in the Generation R testing sample (i.e., internal validation), it did not associate with measured infections in ALSPAC (i.e., external validation), and the MPS was not predictive of offspring health outcomes (psychiatric symptoms, asthma, and BMI) in early adolescence in either cohort.The contrast between well-performing MPSs for CRP versus our MPS of prenatal infection could reflect a number of factors, including differences in discovery sample size (MPSs of CRP are based on EWAS meta-analyses of ~8000-22000 participants) (24,26,74), the number of significant epigenetic signals (with EWASs of CRP identifying numerous genome-wide significant CpG associations), overfitting (our whole discovery sample was used for the EWAS and then later split up to create the MPS), and the different design (MPSs for CRP are typically derived using weights from cross-sectional EWAS analyses where DNAm and CRP are measured at the same time point, compared to our study prospectively associating prenatal infection exposure with cord blood DNAm).Furthermore, inflammatory markers may not only reflect infection exposure, but can also be influenced by other pre-pregnancy or prenatal maternal factors such as obesity, chronic stress, or inflammatory disorders (76,77).Finally, MPSs of CRP are based on a single biomarker whereas we aimed to create an MPS of a dimensional score encompassing several types of common infections (as we hypothesized similar pathways through which these infections may impact child development).It is interesting that this cumulative infection score showed clear phenotypic associations with (mental) health problems between age 1.5y and 14y in a previous study within the Generation R Study, suggesting that the score itself works as a risk marker (12), but that effects are unlikely to be mediated by cord blood DNAm (at least to the extent captured by our MPS).
As a last step, we examined whether prenatal infection exposure relates to epigenetic clock estimates, given a previous study reporting an association prenatal exposure to HIV and accelerated epigenetic aging in offspring (33).No associations were observed in our study, across gestational epigenetic clocks or cohorts.A potential explanation may be the differential route through which an infection such as HIV (vertical transmission, i.e., directly passing the placenta and the offspring's blood brain barrier) may have more direct effects in cord blood than common infections with pathogens, e.g., influenza virus, that cannot directly pass the placenta.Next to difference in severity of infection, another reason may be the difference in timing, as the prior study investigated the period between late childhood and adolescence as opposed to epigenetic gestational age at birth.Given that DNAm is temporally dynamic, and that accelerated epigenetic (gestational) ageing at birth may not mean the same thing as accelerated epigenetic ageing later in life (where it is generally considered a marker for ageing and disease risk), future longitudinal studies with repeatedly assessed DNAm will be needed to clarify whether and how prenatal infection exposure relates to epigenetic age at different developmental stages.
Our results should be interpreted in light of several limitations.First, data on prenatal infections were gathered once per trimester, and there were no concurrent blood measurements during the infection events to quantitatively validate the prenatal infections.The use of self-reported questionnaires introduces the possibility of reporter bias, as it was recalled retrospectively after each trimester.At the same time, the use of self-report questionnaires presents certain advantages.Unlike biological measurements that require a visit to the research center and may have a short half-life, these questionnaires could be conveniently filled out at home at any time.This flexibility reduces the likelihood of encountering healthy volunteer (selection) bias, where participants experiencing an infection might be less inclined to attend a research visit.Moreover, infections were measured at three time points, minimizing measurement error at one time point, which may occur with single-time point measurement reporting on the full pregnancy.Second, as 8 out of the 10 domains from the Generation R infection sum score were not available in the ALSPAC cohort, there were differences in the scoring method and number of infection types assessed between the two cohorts, limiting direct comparison and replication.To explore how these differences may have influenced our results, we generated an abbreviated infection score in Generation R including the same two domains as ALSPAC, and indeed found that the use of this abbreviated score decreased our ability to identify associations based on the suggestive sites identified in our EWAS.However, an MPS of infections the abbreviated score in Generation R Study still performed relatively well in the test set (internal validation) but did not show associations with measured infections in ALSPAC, suggesting that differences in the scores only partly account for the lack of replication.Third, we were limited in our ability to study the role of severity (beyond including fever in our cumulative infection score), chronicity, and type of infection on the association between infections during pregnancy and DNAm levels at birth.It will be important to investigate these aspects of infections in future studies.Future studies should explore the impact of specific types of common infections during pregnancy on neonatal DNA methylation, for example, within the category of urinary tract infections (UTIs), the type of bacteria causing the UTI, the severity and spread of the infection may influence associations with offspring DNAm and downstream (neuro)developmental outcomes.Fourth, although we had a significantly larger sample size compared to prior studies, we may have been underpowered to detect more subtle effects between prenatal infection exposure and epigenetic patterns at birth.Fifth, in the future, it will be important to examine the potential role of additional factors, such as antibiotic and other medication use, in the association between prenatal infection and DNAm, as well as with other potentially relevant outcomes like eczema.Finally, given the pronounced correlation between white blood cell proportions and DNAm levels in cord blood, and the biological importance of these cells in the context of prenatal infection exposure (which can lead to elevated levels of white blood cells that may mediate the effects of infection exposure on offspring), we performed our EWAS both adjusted and unadjusted for white blood cell proportions.These analyses yielded highly comparable findings, which may suggest either that cell-type proportions are not influencing epigenetic associations with prenatal infection exposure, or that the currently available cell-type estimation panels for cord blood do not have sufficiently resolution to identify relevant cell subpopulations.Future studies should consider using more comprehensive cell-type panels, such as those provided by EpiDISH, to capture cell-type composition in cord blood at a more granular level and conduct cell-type specific EWASes.The development of an expanded cell-type reference panels for cord blood, such as those differentiating between white blood cell sub-populations, would enable a better characterization of the presence and specificity of infection-DNAm associations across a broader range of cell types.

CONCLUSION
In conclusion, in this large, multi-cohort effort, we found limited evidence for an association between exposure to self-reported clinical evident infections (based on a cumulative score encompassing different types of common infections) during pregnancy and DNAm in cord blood in the general population, at the level of individual DNAm sites, regions, broader methylation profiles and epigenetic clock estimates.In the future, larger multi-cohort meta-analyses will be needed to detect subtle associations with greater power, to enable the investigation of individual infection types that may have pathogen-specific effects and to validate reported infections with biological markers of infection exposure.Exploring different tissues (across development) and other epigenetic processes will also contribute to a better understanding of the potential association between prenatal infections and the fetal epigenome, and downstream effects on offspring health.

Table 1 .
TABLES Baseline characteristics

Table 2 .
Total infection sum score and DNA methylation at birth (suggestive hits) in the discovery dataset (Generation R cohort) Of note, linear regression models were adjusted for child sex, maternal age at delivery, maternal education, maternal tobacco use, parity, batch effects (sample plate), and cell type proportions.

Table 3 .
Methylation profile scores for infections and infection sum scoresOf note, _abbreviated indicates the abbreviated infection score in Generation R with the same two domains as in ALSPAC.Moreover, linear regression models were adjusted for child sex, maternal age at delivery, maternal education, maternal tobacco use, parity, batch effects (sample plate), and cell type proportions.