Epigenome-wide association study of bronchopulmonary dysplasia in preterm infants: results from the discovery-BPD program

Bronchopulmonary dysplasia (BPD) is a lung disease in premature infants caused by therapeutic oxygen supplemental and characterized by impaired pulmonary development which persists into later life. While advances in neonatal care have improved survival rates of premature infants, cases of BPD have been increasing with limited therapeutic options for prevention and treatment. This study was designed to explore the relationship between gestational age (GA), birth weight, and estimated blood cell-type composition in premature infants and to elucidate early epigenetic biomarkers associated with BPD. Cord blood DNA from preterm neonates that went on to develop BPD (n = 14) or not (non-BPD, n = 93) was applied to Illumina 450 K methylation arrays. Blood cell-type compositions were estimated using DNA methylation profiles. Multivariable robust regression analysis elucidated CpGs associated with BPD risk. cDNA microarray analysis of cord blood RNA identified differentially expressed genes in neonates who later developed BPD. The development of BPD and the need for oxygen supplementation were strongly associated with GA (BPD, p < 1.0E−04; O2 supplementation, p < 1.0E−09) and birth weight (BPD, p < 1.0E−02; O2 supplementation, p < 1.0E−07). The estimated nucleated red blood cell (NRBC) percent was negatively associated with birth weight and GA, positively associated with hypomethylation of the tobacco smoke exposure biomarker cg05575921, and high-NRBC blood samples displayed a hypomethylation profile. Epigenome-wide association study (EWAS) identified 38 (Bonferroni) and 275 (false discovery rate 1%) differentially methylated CpGs associated with BPD. BPD-associated CpGs in cord blood were enriched for lung maturation and hematopoiesis pathways. Stochastic epigenetic mutation burden at birth was significantly elevated among those who developed BPD (adjusted p = 0.02). Transcriptome changes in cord blood cells reflected cell cycle, development, and pulmonary disorder events in BPD. While results must be interpreted with caution because of the small size of this study, NRBC content strongly impacted DNA methylation profiles in preterm cord blood and EWAS analysis revealed potential insights into biological pathways involved in BPD pathogenesis.


Background
Lungs of preterm infants born at 24-36 weeks of gestational age (GA) are undergoing late canalicular and saccular phase development including widening of distal airways to prepare for the subsequent formation of alveoli, differentiation of type 1 and 2 alveolar epithelial cells, and thinning of the air-blood barrier [1,2]. Premature infants born during this GA frequently require oxygen (O 2 ) supplementation in the neonatal intensive care unit (NICU) for an extended time. Bronchopulmonary dysplasia (BPD) is a leading cause of chronic respiratory morbidity among survivors of preterm birth with the greatest risk for those born at 23-30 weeks GA [3,4]. BPD is diagnosed if an infant has a need for oxygen therapy to maintain blood oxygen level at 36 weeks postmenstrual age (PMA) for babies born before 32 weeks GA or at 56 postnatal days of age for babies born at or after 32 weeks GA [4]. The epidemiology of BPD continues to demonstrate that birth weight and GA are the most predictive risk factors for developing BPD, and the frequency of BPD has been approximately 40% in surviving infants born at ≤ 28 weeks GA and about 30% in infants with birth weight < 1000 g during the 20 years from 1993 to 2012 [5].
While knowledge about O 2 -induced BPD pathogenesis has significantly improved, the mechanisms that lead to the disease are not fully understood. Genetics, prenatal conditions, postnatal factors, and the interaction with O 2 therapy are thought to drive disease development. Advances in supportive therapy including antenatal/postnatal corticosteroids and surfactant replacement have improved survival rate and greatly reduced the respiratory distress syndrome and prominent pathology including disruption of immature lung structures (e.g., diffuse airway damage, smooth-muscle hypertrophy), neutrophilic inflammation, and parenchymal fibrosis, sometimes referred to as 'old' BPD [6], in the patients. This 'old' BPD was observed in preterm infants who were exposed to aggressive mechanical ventilation and high concentration of O 2 and has largely disappeared [6].
However, many extremely preterm and very low birth weight neonates who received O 2 therapy had no apparent lung damage but displayed a new pattern of lung developmental disorder ('new' BPD) [7]. These 'new' BPD patients showed less, or were free, of the 'old' BPD pathology but displayed altered lung growth (i.e., arrested alveolarization with larger, simplified alveoli, disturbed pulmonary microvascular development) and an increase in elastic tissue proportional to the severity and duration of the respiratory disease before death [3,[7][8][9]. Development of the 'new' BPD is thought to be attributed to the prenatal exposure factors (e.g., chorioamnionitis, maternal smoking) and/or genetic/heritability factors critical for surfactant synthesis, vascular development, and inflammatory regulation [10][11][12][13]. The 'new' BPD lacks a definitive cure, and the persistent lung impairment leads to long-term pulmonary outcomes including functional abnormalities and increased risk for adverse respiratory symptoms (e.g., airway diseases, exercise intolerance) in BPD survivors [14][15][16][17].
During the last decade, genome-wide association (GWAS) analysis in different ethnic groups has revealed several potential genetic risk factors [13,[18][19][20] and transcriptomic studies have looked for biomarkers of susceptibility and early signs of pathology [21][22][23][24][25]. In addition, epigenome-wide DNA methylation studies have reported associations of epigenetic profiles with GA at birth (cord bloods) [26][27][28][29][30][31], with prematurity complications at discharge [32], and with BPD (autopsy lung tissues) [33]. Experimental studies have revealed DNA methylome-associated genes in BPD pathology [33][34][35]. These studies suggest that methylation profiles in cord blood might prove useful for detecting acquired risk factors, susceptibilities or nascent disease. However, prospectively determining the methylomic landscape of BPD pathogenesis in cord blood at birth is challenging because the developing fetal hematopoietic system is highly dynamic, with cell-type composition and DNA methylation strongly correlated with GA and/or birth weight [36].
Using DNA methylation analysis of cord blood DNA, we investigated association of GA and birth weight with the estimated distribution of cord blood cell types, particularly nucleated red blood cells (NRBCs), in a pilotsize cohort of preterm infants with or without BPD. We describe changes in methylation-based estimates of blood cell-type composition in relation to GA and birth weight. After adjusting for covariates (GA, birth weight, cell-type proportions, etc.), we identify differentially methylated CpGs and genes associated with the BPD epigenome. Furthermore, we also determine stochastic epigenetic mutation (SEM) load relative to BPD and examine methylation-based estimates of epigenetic GA (EGA) in BPD.

Demographics and association with clinical outcomes
After applying exclusion criteria, a total of 107 premature infants with methylation data were included in the study. Diagnosis of BPD was made for 14 preterm infants (13%) based on the NIH consensus definition of BPD (see Methods) and O 2 needs at either 36 weeks PMA or at 28-56 days of postnatal life based on GA [4]. Table 1 and Additional file 1: Table S1 present the maternal characteristics, fetal complications, and outcomes of the prematurely born neonates diagnosed as BPD or non-BPD. BPD was diagnosed for 22.5% of males and 7.5% of females. Infants with BPD were born at significantly lower mean birth weight (925.9 g vs 1187.9 g) and GA (27.6 weeks vs 30.3 weeks) compared to non-BPD infants (Table 1). Birth weight was significantly (r 2 = 0.38, p < 1.0E−09) associated with GA (Fig. 1A). Supplemental O 2 therapy was given to stabilize premature infants' breathing pattern, blood O 2 levels, and to support sufficient growth and weight gain. We observed a significant negative correlation (r 2 = 0.32, p < 1.0E−09) between GA of infants and their cumulative days of O 2 supplementation in the NICU (Fig. 1B). Low birth weight infants showed a similar association with longer O 2 supplement (r 2 = 0.23, p < 1.0E−07, not shown); however, it was notable that ten neonates with birth weight < 1000 g did not require supplemental O 2 to maintain blood O 2 levels. In agreement with many other studies [5,10,37], it is evident from these data that extremely low birth weight, low GA infants are more likely to need supplemental O 2 and to develop BPD.

Methylation-based estimation of cord blood cell-type composition
The methylation-based estimated percentages of CD8 + T cells, CD4 + T cells, B cells, natural killer cells, monocytes, granulocytes, or NRBCs in cord blood DNA did not differ significantly between BPD and all non-BPD infants. We also observed no cell-type differences relative to the non-BPD infants who did not require supplemental O 2 ( Fig. 2A, Additional file 1: Table S2). Figure 2B  gradual shifts relative to birth weight but even the largest preterm infants have more NRBCs and fewer granulocyte than full-term infants (mean birth weight = 3453 g) ( Fig. 2B far right bar, estimated cell-type composition for full term infants extracted from our recent study, Bergens et al. [38]). Among all preterm infants, there was a significant positive correlation between birth weight and CD4 + T cell percentage (r 2 = 0.183, p = 4.3E−06) suggesting a linear relation with hematopoiesis and fetus size. GA was also correlated with CD4 + T cell percent (r 2 = 0.14, p = 7.4E−05) but not as highly. NRBCs, immature red blood cells with nuclei containing genomic DNA, occur at substantial frequencies in cord blood. Our analysis estimated high average NRBC levels in the smallest, lowest GA infants (Fig. 2B). The estimated proportion of cord blood DNA contributed by NRBCs ranged from 5.7 to 97.2% (mean = 29.9%, SD = 24.1%) and a relatively large number (n = 18) of preterm neonates had > 50% NRBC DNA in their blood (Fig. 2B, Additional file 1: Table S2). The mean of 29.9% is similar to the reference mean value of 25% for infants born at less than 30 weeks GA [57]. We thus assessed if NRBC proportions affected DNA methylation levels by comparing genome-wide profiles from high and low NRBC cord blood samples. Figure 2C shows overall DNA methylation density profiles (~ 450,000 CpGs) from the 10 highest (brown) and the 10 lowest (blue) NRBC cord blood samples. The distribution of β-values (estimate of methylation level) was markedly dissociated between the low-and high-NRBC samples. Moreover, a reduction Association between birth weight, gestational age and oxygen supplementation in preterm neonates. A There was a significant correlation between gestational age and birth weight among all preterm infants (n = 107). BPD neonates had significantly lower birth weight and gestation age compared to non-BPD. B An inverse association was observed between days of supplemental oxygen (O 2 ) therapy in newborn intensive care unit (NICU) and gestational age in preterm infants. Infants with BPD had more days of O 2 supplementation regardless of gestational age compared to non-BPD. Red circles = BPD (n = 14), blue circles = O 2 supplementation ≥ 5 days and non-BPD (n = 41). Open circles indicate preterm non-BPD neonates with 1-4 days (n = 21) or with zero (n = 31) supplementation of O 2 Fig. 2 DNA methylation-based prediction of cord blood cell-type composition. A Box and whisker plot shows percent distribution of 7 blood cell types in preterm infants with or without bronchopulmonary dysplasia (BPD). Mean indicated by open dot within box, median, 25th and 75th percentile presented. B Variation in percent cell-type distribution by birth weight quintile and comparison with that in full-term infants. C Density plot of genome-wide methylation distribution comparing profiles of the 10 highest (brown) and 10 lowest (blue) nucleated red blood cell (NRBC) samples. Yellow box highlights demethylation in high NRBC samples. D Correlation of estimated percent NRBC with birth weight in preterm infants. E Correlation of aryl-hydrocarbon receptor repressor (AHRR) smoking biomarker (cg05575921) with estimated NRBC and distribution among BPD groups. CD4T CD4 + T cells, CD8T CD8 + T cells, NK natural killer cells, Mono monocyte, Gran granulocyte in fully methylated CpGs was found in the high-NRBC bloods indicating demethylation relative to the expected methylation profiles for low NRBC bloods. Using univariate regression, there was a significant (r 2 = 0.160, p = 1.9E−05) negative correlation between estimated NRBC percent and birth weight (Fig. 2D) and a nominally significant correlation with GA (p = 0.027). Importantly, while four of the preterm neonates among the very low birth weight and very high NRBCs developed BPD (Fig. 2D), a large proportion of the high NRBC bloods were among the larger neonates, suggesting other factors in the etiology of NRBCs. DNA methylation levels of less than 75% at aryl-hydrocarbon receptor repressor (AHRR) cg05575921, the top-ranked epigenetic biomarker of smoking, are suggestive of prenatal tobacco smoke exposure [38,39], and frequency of NRBCs in cord blood has been correlated with cigarette smoking [40,41]. The blood samples with high NRBCs in our cohort were significantly correlated with lower percent methylation level of cg05575921 (Fig. 2E), which suggested recent maternal exposure to tobacco smoke in these preterm babies. The genome-wide demethylation of NRBCs (shown in Fig. 2C) did not contribute substantially to this result (see Additional file 2: Figure S1). NRBC estimation is based on a reference set of 100 NRBC-specific CpGs, and the estimates are used as covariates in EWAS to adjust for any difference in NRBCs across the disease groups. Because NRBCs have highly divergent methylation profiles, we had a concern that adjustment in multivariable regression analysis might not fully compensate for differences in NRBC levels among cord blood samples, and thus, we assessed if additional CpGs were NRBC-specific relative to the reference NRBC set. Comparing reference NRBC methylation profiles to preterm cord blood methylation profiles identified 3647 CpGs at genome-wide significance level (Additional file 1: Table S3).

Epigenome-wide association study (EWAS) of BPD in preterm infants
Robust linear regression analysis with adjustment for covariates (GA, birth weight, smoking, ethnicity/race, seven cell-type percentages, hospital at birth, and percentile of days in which O 2 supplementation used in NICU) identified 38 CpGs associated with BPD at genome-wide (Bonferroni, p < 1.12E−07), and 275 CpG sites at false discovery rate (FDR) < 1% (Fig. 3), among which 64% (176/275) CpGs were differentially hypomethylated in BPD than in non-BPD (Table 2 and Additional file 1: Table S4). At FDR 5% (Additional file 1: Table S4), 1581 CpG sites annotated to a total of 2164 genes were differentially methylated between BPD and non-BPD (874/1581 loci were hypomethylated in BPD). The 275 FDR 1% differentially methylated loci were annotated to a total of 386 nearby genes ( Table 2 and Additional file 1: Table S4). Winsorizing the DNA methylation dataset to reduce the potential impact of outliers had little effect on the detection of the differentially methylated CpGs (Additional file 1: Table S5). Of the 275 BPD CpGs at 1%FDR, only 6 CpGs overlapped with the NRBC-specific epigenome list (Additional file 1: Table S3).

BPD association with EGA acceleration
EGA is a concept introduced by Horvath [43] in which age, or in this case GA, is estimated by a model using DNA methylation data and is compared with actual recorded GA. EGA acceleration in newborns has been associated with a set of adverse conditions [28,44]. We calculated EGA using two published methods [26,28]. Comparing recorded GA with calculated EGA, we found both Bohlin et al. [26] and Knight et al. [28] methods overestimated GA for both non-BPD and BPD infants in our study (Additional file 2: Figure S2A). EGA acceleration was calculated as the residual of the linear regression of EGA on GA for each sample. Then, we carried out a multivariable linear regression analysis to test if EGA acceleration was associated with BPD status. Additional file 2: Figure S2B displays violin plots of the results of the different models and significance levels without or with adjustment for all EWAS covariates. The Bohlin model produced an EGA acceleration result suggesting a more mature value for BPD infants (adjusted p = 0.033).
A total of 275 CpGs were significantly different between BPD (n = 14) and non-BPD (n = 93) infants at FDR = 1%. Shown are 39 passing Bonferroni correction (p < 1.04E−07)-and four selected (*) among FDR CpGs are shown. Robust linear regression p value adjusted by GA, sex, birth weight, 7 cell types, percentage cumulative neonate intensive care unit oxygen days and hospital Meth mean methylation value (%), dMeth methylation difference in BPD relative to non-BPD, Chr chromosome, hg19 human genome assembly GRCh37 Full list of the significant CpGs differentially methylated between two groups at FDR 5% (1581 CpGs) is in Additional file 1:

BPD associated with increased stochastic epimutations
SEMs are outlier methylation values observed at a CpG relative to other samples in a dataset and are thought to represent an alteration of epigenetic maintenance [42,45]. SEMs have previously been linked to preterm birth [42], cancer risks [46], and aging [45], and they may be a mediator between environmental exposures and adverse health outcomes. Epigenetic mutation load (EML) values were calculated as the natural logarithm of total number of SEMs per individual [42,45]. Among non-BPD samples, SEMs were highly variable, with a range of 191 to a maximum of 75,588. First, the association between EML and BPD status or other covariates was examined by linear regression as well as biweight midcorrelation (bicor), a median-based measurement of correlation that is robust to outliers [47]. In these unadjusted analyses, EMLs were significantly higher among BPD neonates (p = 4.18E−05 for linear regression and p = 6.56E−04 for bicor, Fig. 4). Additionally, in univariate analyses EMLs were associated with birth weight, cg05575921 methylation, some cell-type percentages and cumulative days of O 2 (Additional file 1: Table S6). We further performed a robust linear regression of EMLs on BPD status with adjustment for all EWAS covariates and observed a strongly attenuated but significant association between EMLs and BPD (p adjusted = 0.02). In order to explore what the source of SEMs might be relative to other outcomes we compared the list of multiply occurring SEM CpGs (CpGs occurring 3 or more times among SEMs) with the list of CpGs associated with the NRBC reference set (3647 CpGs at Bonferroni significance). The overlap was highly significant, with 30% (1095/3647) of SEM CpGs found to be among NRBC-associated CpGs (p < 2.2E−16).

Pathway analyses for genes annotated in BPD epigenome
To gain insights into the relationship between differentially methylated CpGs and BPD pathogenesis, CpGs were annotated to the nearest genes and pathway analysis tools were applied to elucidate gene ontology, biological process, diseases, and canonical pathways of these annotated genes cutoff at both FDR 1% and FDR 5%. Importantly, many enriched functions and pathways were related to lung development (Table 3 and Additional file 1: Table S7A, Fig. 5). These included retinoic acid receptor (RAR)/retinoid X receptor (RXR) signaling (e.g., NR0B2, NCOR2, VDR, RARRES1), androgen receptor signaling (e.g., PTEN, BRCA1, DSTN, RB1, FOXP1), cell proliferation and extracellular matrix (ECM) events including BPD-associated epithelial-mesenchymal transition (EMT) [48] (e.g., RB1, RBPJ, FOXP1, AVDR, CHRM5, COL21A1, VEPH1), and lung surfactant and glycosaminoglycan metabolism and alveolarization (e.g., CTSH, AGER, FOXP1, CAVIN2, SPOCK2, CHST14, HS6ST1, ARRB1, DSTN). In addition, hematological system development and vascular disorder related processes, which are also critical to lung maturation, were associated with the BPD epigenome (Table 3 and Additional file 1: . Other genes were related to BPD-associated outcomes such as retinopathy of prematurity (ROP, e.g., GBP3, FJX1, HIPK2) and hearing loss (e.g., GJB6, TMEM63B) and mitochondrial energy metabolism (e.g., TOMM7, SLC25A26, SLC25A33, PDK1, PKM, MDH1). Enriched functions and pathways of the annotated genes from FDR 5% cutoff further expanded focal adhesion (e.g., multiple cadherin genes) and actin cytoskeleton organization (e.g., ABL2, CUL3, GAS7, SHROOM3, DBNL, RND3) signaling pathways that are critical in cellular morphogenesis and movement particularly during development (Additional file 1: Table S7A). Table 3 and Additional file 1: Table S4 also include the results from the GOmeth analysis on Gene Ontology (GO, Additional file 1: Table S7B) and Kyoto Encyclopedia of Genes and Genomes (KEGG, Additional file 1: Table S7C) databases to enrich the pathways with consideration of the different numbers of CpG sites per gene and the CpGs annotated with multiple genes. Overall, these results suggest multiple cord blood cell genes that were differentially methylated in BPD may play roles in critical cellular and molecular processes related to BPD pathogenesis.

Differential expressions of BPD cord blood epigenome markers in murine lungs
We determined if genes associated with BPD cord blood epigenome are changed in murine lungs during lung developmental process and in a mouse model of BPD. During saccular stage of lung development (embryonic day E17.5-postnatal day PND4), mRNAs and proteins of mouse SPOCK2 and AGER were highly increased with peaks toward the saccular-to-alveolar transition time at PND4 and thereafter (Additional file 2: Figure S3; Additional file 3: Figure S3B), indicating their roles in alveolar development [49,50]. Mouse CTSH expression was higher at early saccular stage with mRNA peaks at E17.5-E18.5 and protein peaks at and before E17.5 (Additional file 2: Figure S3; Additional file 3: Figure S3B), supporting its contribution to lung branching, surfactant production and secretion at saccular stage [51,52]. Hyperoxia exposure upregulated both message and protein levels of SPOCK2, CTSH, and AGER in newborn mouse lungs (Additional file 2: Figure S3; Additional file 3: Figure S3B).

Discussion
In the current study, we report a number of new observations regarding the epigenetics of preterm GA and birth weight relative to cord blood cell-type composition, including NRBCs, and describe an EWAS analysis comparing a small group of BPD neonates to non-BPD neonates. NRBCs typically compose less than 10% of non-pathologic cells in full term cord blood and are rapidly cleared from the bloodstream after birth [53,54]. Numerous reports have observed that higher NRBC levels were associated with prenatal complications such as placental dysfunction, intrauterine hypoxia, preeclampsia, asphyxia, and maternal obesity and diabetes in term and preterm infants [54][55][56][57][58] as well as later risk of unfavorable outcomes [59]. Using a methylation-based model to estimate NRBCs, we observed that lower birth weight and GA were associated with high NRBC levels, but considerable variation was observed, with some of the larger and older infants displaying very high NRBC content. NRBC proportions were, however, not associated with BPD. It has been suggested that fetal oxidative stress or hyperoxic stress caused by maternal smoking might be a driver for NRBC formation. A significant correlation between newborn venous NRBC count and the number of cigarettes smoked per day of mothers has been reported [40]. The present study provides support for this hypothesis as we observed a significant correlation between higher cord blood NRBCs and demethylation of AHRR cg05575921 (an established biomarker of maternal smoking) [60]. However, in a previous epigenetic study of full-term births we did not observe a significant relationship between maternal smoking and estimated NRBC percent or actual counts [38]. High levels of NRBCs in cord blood can potentially confound DNA methylation studies because they have an unusual genome-wide methylation profile caused by genome-wide DNA demethylation during enucleation [61][62][63]. For NRBC enucleation and maturation, histone acetylation status-dependent nuclear and chromatin condensation is known to be essential [64]. Consistent with these notions, we found genome-wide demethylation in high-NRBC cord bloods compared to low-NRBC cord bloods (shown in Fig. 2C). Comparing NRBC reference CpGs to cord blood profiles of the non-BPD individuals, we found 3647 significantly associated CpGs (Bonferroni). The present BPD-EWAS used estimated NRBC percentage as a covariate and only a small number of BPD CpGs at FDR1% (6 of 275 CpGs) overlapped with the NRBC-EWAS-associated CpGs (3647 CpGs).
GA together with birth weight are the most important predictors for neonate morbidity and mortality. Many recent studies indicate the association of cord blood DNA methylation profiles with GA at birth [26][27][28][29][30][31]. In non-BPD samples, we conducted EWAS with covariate adjustment on GA (378 CpGs at Bonferroni) and on birth weight (3 CpGs at Bonferroni) (Additional file 1: Table S9). No CpGs among the EWAS analyses of GA A total of 273 genes were significant varied (*Bonferroni and/or false discovery rate < 0.05) between BPD (n = 6) and non-BPD (n = 16) cord blood cells as determined by cDNA microarray analysis (Illumina HumanHT-12 WG-DASL V4.0 R2 expression beadchip). Ingenuity, Reactome, and ToppGene pathway analyses tools used to determine enriched functional categories and pathways. dEx = expression difference (Log2) in BPD relative to non-BPD. Full list of the differentially expressed genes are in Additional file 1: Table S8 and birth weight were in common with the BPD Bonferroni EWAS CpGs. However, at FDR 1% we found 5 CpGs (cg02236679, cg11791427, cg16762386, cg17514088, cg19595760) overlapped between EWASs on BPD and GA. In addition, we found 225 CpGs overlapped between the present EWAS on GA and two recently published EWAS on GA [26,27].
Considering the importance of GA in neonatal health outcomes, cord blood DNA methylation has been incorporated into predictive GA models [26,28], and the discrepancy between GA estimated from DNA methylation (epigenetic maturity, EGA) and clinically recorded (chronological GA, determined by ultrasound or last menstrual period) is termed EGA acceleration [43]. Several studies have reported that lower GA acceleration values (i.e., epigenetically less mature than their clinical GA) were associated with maternal factors such as gestational diabetes in a previous pregnancy, Sjögren's syndrome, and maternal Medicaid (vs private insurance), as well as postnatal surfactant or steroid use, longer days of assisted ventilation, lower birth weight, and BPD development [28,44,65]. More mature or accelerated EGA has been correlated with maternal age of over 40 years, previous pregnancy, preeclampsia, or maternal steroid treatments [28,44,65]. In the current study, we measured EGA using models developed by Bohlin [26] and Knight [28]. In each case the EGA models overestimated GA and indicated accelerated EGA (more mature) in neonates that went on to develop BPD, a result opposite of a previous report in which infants with accelerated EGA were less likely to develop BPD [65]. While the present study is limited by size, the determination of EGA acceleration and its relationship to developmental and perinatal factors and adverse respiratory outcomes would appear to need more study.
Epigenetic drift that leads to stochastic epimutation is a recent concept defined as outlier methylation levels at a genomic position relative to the interquartile range of methylation values determined for all samples in a dataset. It has been proposed that SEMs reflect loss of epigenetic regulation and may be involved in aging and carcinogenesis [66]. Spada et al. [42] observed in a small study that total SEM burden (both hypomethylated and hypermethylated SEM) was significantly greater in preterm cord blood samples relative to full-term, and proposed weak maintenance of epigenetic state in preterm blood might be related to risk of disease in later life. Epimutation load levels (SEM load per individual) in the current study were strongly associated with a number of covariates in univariate analysis including birth weight, several cell types, and days of oxygen therapy, all covariates that were strongly associated with altered methylation profiles. Indeed, in both unadjusted (p = 5.36E−09) and adjusted (p = 1.89E−06) regression analysis, NRBC estimates were strongly associated with EMLs. Adjusting for all covariates in multivariable regression on BPD status, we observed a strongly attenuated level of significance (p = 0.03) for difference in EML among BPD infants relative to non-BPD infants suggesting a very large impact of cell-type composition. Spada et al. [42] reported that SEMs in cord blood occurred at CpG sites, and genes (NCOR2, PLCH1, FOXK1, and IGF2BP1) that were in common with those annotated to EWAS CpGs on preterm birth; however, these analyses were not adjusted for NRBC percentages. While the biological significance and source of stochastic epimutation in preterm cord blood are largely unknown, our finding that a very large proportion of CpGs (30%) that we observe as SEMs were among the CpGs observed as NRBC Bonferroni CpGs suggests that determination of SEMs may be strongly confounded by cell-type composition.
It is unknown if epigenetic profiles in cord blood cells represent or mirror those in the neonatal lung developing BPD; however, the study of Merid et al. [27] reported that 78 CpGs overlapped between GA EWAS analyses in cord blood and fetal lung tissues, suggesting this may be the case. The EWAS result in the present work revealed 275 CpGs significantly associated with BPD risk at 1% FDR. Examining the genes annotated to these CpGs, we found potentially important signal transduction pathways and biological functions related to BPD risk, including lung development-associated pathways and functions.
The lung is an active organ for platelet activation and a pool for hematopoietic progenitor cells that can migrate to and repopulate the bone marrow and contribute to hematopoietic lineages in blood [67]. One of the hematopoietic pathways associated with the BPD epigenome was angiogenesis and platelet activation (Fig. 5), and genes associated with runt-related transcription factor 1 (RUNX1), AHR and VEGF receptor pathways were predicted to play key roles. Regarding genes associated with lung development, CTSH was annotated to the most significant CpG associated with BPD (cg23328237) and was reported to be differentially methylated and expressed in BPD lungs compared to control lungs [33]. CTSH plays a role in surfactant production [51] and bone morphogenetic protein 4 (BMP4)-mediated lung branching [52]. The same CpG is annotated to RASGRF1, a gene that has been associated with BPD in GWAS [18]. We found several annotated genes that are critical to lung glycosaminoglycan metabolism and contribute to alveolarization and hematopoiesis [68]. SPOCK2, the other GWAS-determined BPD susceptibility gene, was upregulated in a rat lung model of BPD [13] and use of lung-specific SPOCK2 overexpression mice demonstrated its deleterious role in BPD development [49]. In addition, AGER, a specific alveolar type 1 cell differentiation marker [50], and HS6ST1 encoding heparin sulfate 6-O-sufotransferase 1 [69] involve in alveolar development. BPD epigenome-annotated genes were also associated with androgen receptor signaling which delayed alveolar maturation and increased respiratory morbidity in preterm male infants [70], suggesting a molecular basis of male susceptibility of newborn pulmonary morbidity and possibly BPD [71]. Vitamin A regulates lung growth (e.g., branching, proximal-distal patterning, alveolar septation, surfactant production) and supports the immunity and repair of injured respiratory epithelium [72,73]. Among the RAR-RXR and RXR-vitamin D receptor (VDR) signaling genes enriched, NCOR2 is a transcriptional corepressor in various developmental signaling [74] and has been a commonly determined gene in multiple EWAS for GA prediction [26,27] as well as in epigenetic mutation CpG in preterm birth [42]. It is also known to affect later life lung function in chronic respiratory conditions [75]. BPD has also been associated with vitamin D receptor (Fok1) polymorphisms [76] and with lower levels of 25-hydroxyvitamin D in preterm infants [77,78]. Overall, pathway analyses indicated that epigenome changes in cord blood immune and progenitor cells may in part anticipate the lung pathogenesis in BPD patients.
There are several limitations to this study, the most obvious being the small number of BPD patients included in the analysis and this restricts interpretation to the generation of new hypotheses. The logistics of obtaining cord blood samples from extremely low gestational age neonates are challenging and contributed to limiting this study to a pilot scale rather than a full-size investigation. Some studies of preterm chronic lung disease have observed that inflammatory conditions such as chorioamnionitis are associated with changes in celltype counts, which might be indicators of susceptibility to BPD [79]. Study size limitation and availability of data did not allow investigation of this question. At the time of the enrollment of patients, it was not known that leukocyte cell counts, NRBC counts and cell-type proportions would be important covariates in an EWAS and some potentially useful data was not abstracted from original medical records. The inclusion of estimated cell-type proportions as covariates has been the standard approach to reduce the possibility of confounding by leukocyte cell-type variability. The highly dynamic nature of hematopoiesis in the developing fetus, including the presence of DNA from NRBCs in cord blood, make DNA methylation studies in preterm neonates a challenge. The present study is one of only a very small number of published epigenetic studies of preterm birth and BPD, and this epigenetic exploration of cell-types in relationship to GA, birth weight, and BPD provides a unique view of the developing neonate.

Conclusions
Although limited by small sample size, the current investigation provides an exploratory basis for examining potential cord blood DNA methylation biomarkers of BPD risk in preterm infants and offers descriptive comparisons between methylation profiles and various preterm phenotype variations. Further studies are needed to determine if the CpGs identified here will prove to be clinically relevant. A future project using epigenomic and transcriptomic profiling of preterm infant blood in the early weeks of life will examine the persistence of the results described here.

Study cohort
The Discovery-Bronchopulmonary Dysplasia Program (D-BPD) cohort is described in detail elsewhere [80]. Briefly, 378 preterm infants under 1500 g of birth weight in Buenos Aires, Argentina, were recruited within 13 days of life and followed prospectively in the NICU until discharged or 44 weeks of corrected GA. Inclusion criteria were: born alive at any of the four participating hospitals at ≤ 35 weeks' gestation; with very low birth weight (< 1500 g at birth); and free of cyanotic heart disease, congenital anomalies of the respiratory tract (i.e.: tracheoesophageal fistula, pulmonary hypoplasia, diaphragmatic hernia), ocular malformations, congenital immune suppression, or severe malformations affecting breathing or vision (i.e. anencephaly). Infants who died prior to completion of all the first maternal questionnaire were excluded from participation. A total of 107 patients (14 BPD, 93 non-BPD) who satisfied all study inclusion criteria provided a cord blood sample and had a successful methylation array. BPD was diagnosed for infants who received at least 28 days of O 2 (> 21%) supplementation therapy and need for O 2 (≥ 30%) and/or positive pressure (1) at 36 weeks of PMA or at discharge (whichever comes first) if born < 32 weeks GA or (2) at 28-56 days postnatal age or at discharge (whichever comes first) if born ≥ 32 weeks GA [4]. However, two infants receiving 14 or 22 days of O 2 at the time of their death were diagnosed with severe BPD. The study was approved by the local Institutional Review Board (IRB) and the NIEHS (08-E-N159). Parents provided written informed consent.

Genomic DNA and total RNA extraction from cord blood
Umbilical cord blood samples were collected at birth in PAXgene reagent (Qiagen Inc., Valencia, CA) and snap frozen at − 80 °C. Samples were processed with PAXgene Blood miRNA Kit (PreAnalytix/Qiagen) following the manufacturer's procedure. Briefly, blood specimens were incubated at room temperature for 2 h to lyse RBCs and centrifuged (3500g, 15 min) to acquire cell pellets. The pellets were washed and treated with proteinase K at 55 °C (800 rpm, 15 min), and isopropanol was added to the soluble fractions of the supernatants prepared from the Shredder spin columns. For RNA isolation, the isopropanol precipitants were added into the PAXgene RNA spin columns and processed for DNase treatment followed by RNA extraction procedures as indicated in the manufacturer's brochure. For DNA isolation, the isopropanol precipitants prepared with the PAXgene miRNA Kit were loaded into the DNeasy Mini spin columns (DNeasy Blood and Tissue Kit, Qiagen) and followed the manufacturer's procedure. DNAs and RNAs were quantified using Qubit (Thermo Fisher, Waltham, MA) and stored at − 70 °C until used.

DNA methylation microarray analysis
Aliquots (250 ng) of genomic DNA from BPD (n = 14) and non-BPD (n = 93) cord blood cells were bisulfiteconverted using a Zymo EZ DNA Methylation (batch 1, 8 BPD, 43 non-BPD) and EZ-96 DNA Methylation Mag-Prep (batch 2, 6 BPD, 60 non-BPD) kits (Zymo Research, Irvine, CA) which use identical reagents following the manufacturer's instructions. Briefly, all samples were bisulfite converted in a thermocycler with the following conditions: 16 cycles of 95 °C for 30 s followed by a 50 °C hold for 60 min. Cleanup of converted product was then wither done on a column or with magnetic beads using the same kit reagents. Bisulfite-converted DNAs were applied to HumanMethylation450 BeadChip (Illumina, San Diego, CA) which covers over 480,000 CpG sites in human genome for genome-wide DNA methylation array analysis. The raw IDAT files of methylation arrays were read into R with the minfi package [81], and the data were preprocessed with background correction and dyebias normalization using the preprocessNoob method [82]. The combat function in sva package [83] was used to do batch ('Sample_Plate') correction on methylation array data. Prior to normalization, DNA methylation data were filtered based on the following quality control criteria, exclusion of: arrays having more than 5% failed probes (1 array); all CpG probes on the X and Y chromosomes; and any probes containing single-nucleotide polymorphism with a minor allele frequency ≥ 1% (in EUR population of the 1000 Genomes Project) within 5 nucleotides to the CpG site. We also removed 43,254 probes reported to hybridize to one or more non-target sites in the genome [84]. There were 447,246 CpG probes remaining after exclusions. Differentially methylated probes were identified by a robust linear regression analysis of M values (log ratio of beta values) on disease status (BPD vs non-BPD) with adjustment for infant sex, GA (weeks), birth weight (g), ancestry, maternal smoking status, seven estimated blood cell-type proportions, hospital at birth, and days in which oxygen supplementation was used in newborn intensive care unit. Days of O 2 supplementation were transformed to a percentile using empirical percentile transformation method (R function 'percentize' in heatmaply package, https:// www. rdocu menta tion. org/ packa ges/ heatm aply/ versi ons/1. 2.1/ topics/ perce ntize). In order to minimize the impact of outliers on the differential methylation results, a Winsorized methylation (https:// www. rdocu menta tion. org/ packa ges/ DescT ools/ versi ons/0. 99. 44/ topics/ Winso rize) dataset was created and robust linear regression with adjustment was repeated. DNA methylation array data are deposited in Gene Expression Omnibus (GEO, accession number: GSE188944).

Methylation-based cord blood cell-type prediction
Percentages of seven blood cell types (CD4 + T cells, CD8 + T cells, B cells, monocytes, granulocytes, natural killer cells, and NRBCs) were estimated using the reference DNA methylation profiles (R package 'FlowSorted. CordBlood.450 k' [85]) and Houseman deconvolution algorithm [86]. To identify additional CpGs associated with NRBCs, we assessed the association between NRBC reference CpGs and cord blood methylation profiles in non-BPD neonates.

SEM calculation
The calculation of SEM was carried out as in a previously published and validated approach [42,45]. An individual CpG having a methylation level three times the interquartile range above the third quartile or below the first quartile was identified as a SEM. Toward this end, we calculated the interquartile range (IQR) for each of the 447,246 probes. Then, SEMs were identified based on extreme methylation levels. Finally, we summed across the count of all SEMs per sample and defined the total number of SEMs of each study participant as EML. EML was not normally distributed; therefore, we used the natural log of the number of SEMs for all statistical analyses.

EGA estimation and EGA acceleration
We calculated DNA methylation-based GA using two different prediction methods [26,28] and Horvath's method for EGA acceleration [43]. The difference between the residual of the linear regression of methylation-based GA and clinically determined gestation age is referred to as EGA acceleration. Positive EGA acceleration was defined as a greater (or older) DNA methylation GA than clinical GA; negative EGA acceleration was defined as a lower (or younger) DNA methylation GA than clinical GA.

cDNA microarray analysis
Total RNA isolated from cord blood of individuals available at the time of the initial study in 2012 (6 BPD, 16 non-BPD) was amplified, labeled, and fragmented according to the manufacturer's protocol (NuGEN Technologies, Inc., Redwood City, CA) and applied to Illumina HumanHT-12 WG-DASL V4.0 R2 Gene Expression BeadChip targeting > 47,000 transcripts (Illumina, San Diego, CA) in the NIEHS Microarray core facility. Differentially expressed genes were detected using log2-transformed expression fold-change estimates with respect to the Robust Multichip Average-corrected fluorescence log-intensity levels (log 2 FC). Probe-wise log 2 FC values were tested across statistical groups through a resolution-weighted ANOVA. Significance level was accepted at p < 0.05 adjusted for multiple comparisons. Microarray data are deposited in GEO (accession number: GSE188949).

Pathway analyses
Enriched biological processes, functions and diseases, and canonical pathways for the genes associated with the differentially methylated sites (DNA methylation array) or the differentially expressed genes (cDNA microarray) were analyzed using ToppGene Suite (https:// toppg ene. cchmc. org), Ingenuity Pathway Analysis (IPA, Qiagen Inc., Valencia, CA), David Bioinformatics Resources (https:// david. ncifc rf. gov), and Reactome Pathway Database (https:// react ome. org). R package missMethyl performed GOmeth [87,88] on GO and KEGG databases to take into account the different number of CpG probes per gene and multiple gene-annotated CpGs in the pathway analysis.

Developmental mouse studies
Gene and protein expressions were determined in total RNAs and proteins isolated from lungs of CD-1 mice (Charles River, Wilmington, MA) harvested at E13.5, 15,0.5 and 17.5, and PND 0, 1, 4, 14, and 42, or after exposure to air or hyperoxia during PND 1-4 as previously described in detail [89]. All animal use was approved by the NIEHS Animal Care and Use Committee. Total lung proteins were prepared from right lung homogenates (n = 3/group) in RIPA buffer including PMSF (10 μg/ml) and protease/phosphatase inhibitor cocktail (Sigma-Aldrich, St. Louis, MO). Proteins were quantified and 80 μg of pooled proteins were separated on 10-20% Tris-HCl SDS-PAGE gels (Bio-Rad Laboratories, Hercules, CA) and analyzed by Western blotting using mouse-specific antibodies against SPOCK2 (R&D Systems Inc., Minneapolis, MN), CTSH (LSBio, Seattle, WA), AGER (Santa Cruz Biotechnology Inc, Dallas, TX) and β-ACTIN (Santa Cruz Biotechnology) (Additional file 2: Figure S3; Additional file 3: Figure S3B). The assay was done duplicates. An aliquot of total RNA isolated from mouse lungs (n = 3/group) was reverse transcribed into cDNAs, and cDNA (40 ng) was subjected to quantitative PCR in 20 μl reaction containing 0.5 μmol of commercially available (Real Time Primers, LLC, Elkins Park, PA) cDNA primers for Spock2, Ctsh, and Ager using an CFX Connect Realtime System (Bio-Rad) as previously described [89]. The relative quantification of target gene expression was calculated using the comparative quantification cycle (C q ) method by subtracting fluorescence detected C q of 18 s rRNA (5′-tacctggttgatcctgccag-3′, 5′-ccgtcggcatgtattagctc-3′) from that of target gene in the same sample (ΔC T ).

Other statistical analyses
Association between neonatal or maternal characteristics and clinical outcomes, GA and birth weight, and GA and supplemental O 2 days were analyzed by linear regression analyses (GraphPad Prism 9, GraphPad Software, San Diego, CA). One-way ANOVA or two-way ANOVA were used to evaluate the relationship between developmental age or neonatal exposure on mouse gene expressions determined by qRT-PCR. Student-Newman-Keuls test was used for a posteriori comparison of means (p < 0.05). Statistical analyses of qRT-PCR data were performed using SigmaPlot 14.0 program (Systat Software, San Jose, CA).
neonates with or without bronchopulmonary dysplasia (BPD). Table S3. Differentially methylated CpG sites associated with nucleated red blood cell (NRBC) counts. Table S4. Differentially methylated CpG sites associated with bronchopulmonary dysplasia (BPD) risk and their annotated gene information. Table S5. The number of probes identified by differential methylation analysis. Table S6. Association of stochastic epimutation load (EML) with bronchopulmonary dysplasia (BPD) and other covariates. Table S7A. Biological functions and pathways of the genes annotated to the differentially methylated CpG sites (FDR 5%) associated with bronchopulmonary dysplasia (BPD) risk. Table S7B. GOmeth enriched GO terms on CpGs associated with BPD risk. Table S7C. GOmeth enriched KEGG pathways on CpGs associated with BPD risk. Table S8. cDNA microarray determined differentially expressed genes in cord blood cells of infants with bronchopulmonary dysplasia (BPD). Table S9. DNA CpG methylation sites associated with gestational age (GA) or birth weight (BW).