Association between DNA methylation in cord blood and maternal smoking: The Hokkaido Study on Environment and Children’s Health

Maternal smoking is reported to cause adverse effects on the health of the unborn child, the underlying mechanism for which is thought to involve alterations in DNA methylation. We examined the effects of maternal smoking on DNA methylation in cord blood, in 247 mother–infant pairs in the Sapporo cohort of the Hokkaido Study, using the Infinium HumanMethylation 450K BeadChip. We first identified differentially methylated CpG sites with a false discovery rate (FDR) of <0.05 and the magnitude of DNA methylation changes (|β| >0.02) from the pairwise comparisons of never-smokers (Ne-S), sustained-smokers (Su-S), and stopped-smokers (St-S). Subsequently, secondary comparisons between St-S and Su-S revealed nine common sites that mapped to ACSM3, AHRR, CYP1A1, GFI1, SHANK2, TRIM36, and the intergenic region between ANKRD9 and RCOR1 in Ne-S vs. Su-S, and one common CpG site mapping to EVC2 in Ne-S vs. St-S. Further, we verified these CpG sites and examined neighbouring sites using bisulfite next-generation sequencing, except for AHRR cg21161138. These changes in DNA methylation implicate the effect of smoking cessation. Our findings add to the current knowledge of the association between DNA methylation and maternal smoking and suggest future studies for clarifying this relationship in disease development.

Verification of DNA methylation analysis using bisulfite sequencing. Despite being widely used in the field of epigenetics for large-scale DNA methylation profiling, HM450K and analysis of tandem HM450K have their limitations, e.g., restriction to predesigned CpG sites or complications in data normalisation and analysis 23 . Moreover, some of the selected CpG sites from our secondary comparison fall within the list of suspected low-quality probes provided by Naeem et al. 24 , namely ACSM3 cg06478823, AHRR cg21161138, and SHANK2 cg05780228 (Supplementary Tables S1, S2, and S3). Thus, we further verified our findings using next-generation sequencing (NGS) technique following bisulfite conversion. Because of their locations in GC-rich regions, suitable primers could not be designed for cg12876356 and cg09662411 located in GFI1, Therefore, GFI1 was excluded from verification experiments.
Due to some outliers in the NGS result for AHRR cg21161138, SHANK2 cg05780228, and EVC2 cg01290904 (black arrows in Supplementary Fig. S1), we chose to perform analysis both including and excluding outliers. Our NGS analysis showed that seven out of eight CpG sites exhibited similar DNA methylation profiles to that of HM450K, with the exception of cg21161138 within the AHRR gene, which showed no changes in methylation rates between the Ne-S, Su-S and St-S groups (Fig. 4, bottom panel in each target, Table 3, and Supplementary  Fig. S1). The statistical significance in NGS analysis remained the same for all sites after discounting outliers, except for cg012090904 within the EVC2 gene, which fell slightly outside the range of significance (p = 0.0526) after discounting outliers (Table 3). Correlation analysis revealed a moderate to strong positive correlation (ρ > 0.6) between DNA methylation ratio at the seven CpG sites determined by HM450K analysis and by NGS analysis, excluding AHRR cg21161138 (ρ = 0.347 and ρ = 0.331 after and before discounting outlier, respectively) ( Fig. 5 and Supplementary Fig. S2).
Because NGS provides single base-resolution, we were able to examine adjacent CpG sites falling within the region spanned by the primers (Fig. 4, upper panel in each target). Of the 16 neighbouring CpG sites, methylation patterns of 13 sites were consistent with those determined using HM450K for Su-S and St-S groups (Table 3 and Supplementary Fig. S3). Notably, multiple CpG sites near AHRR cg05575921, CYP1A1 cg05549655 and TRIM36 cg07469926 were altered by smoking and were not changed if smoking was stopped in early pregnancy (Table 3 and Supplementary Fig. S3A,S3C and S3F), suggesting that these regions are highly susceptible to smoking.

Discussion
In the current study, we conducted an EWAS using HM450K analysis in a Japanese population from the Sapporo cohort of the Hokkaido Study to address the current interest in DNA methylation alteration due to prenatal tobacco smoke exposure. Considering the beneficial effects of quitting smoking early in the pregnancy, we identified nine CpG sites located in six genes and an intergenic region, whose DNA methylation patterns were altered by smoking but recovered after early smoking cessation. However, the methylation pattern of CpG sites in EVC2 (cg01290904) was observed even after the cessation of smoking. We authenticated the findings of HM450K analysis using bisulfite sequencing, with the exclusion of CpG sites in GFI1 due to technical issues. We found a relatively strong correlation between the two approaches for the methylation status of CpG sites, except for the sites in AHRR (cg21161138). In a primary comparison between each smoking group, CpG sites that passed both criteria, including epigenome-wide significance level (FDR <0.05) and effect size (|β| >0.02), were selected. We chose a difference of 2% to narrow down the CpG sites based on a previous publication, which defined the mean methylation change for both hyper-and hypomethylation as 2% 16 . Moreover, a recent review summarised that most environmental exposure studies have reported changes in DNA methylation from 2-10% and called for focusing on small magnitude alterations 25 . In the EWAS comparison between Ne-S and Su-S groups, we replicated the CpG sites that were consistent with previous studies, particularly the hypomethylated sites in AHRR and GFI1 and the hypermethylated sites in CYP1A1 and MYO1G 13,14,16,17 . Among the 64 DM CpG sites, cg05575921 associated with AHRR was found to be consistently replicated across studies assessing DNA methylation in cord blood or newborn blood 6,13,14,17 . In detail, all CpG sites identified in GFI1, CYP1A1 and MYO1G in this study were also reported in previous studies; AHRR cg21161138 was reported by Joubert et al. 13 , Kupers et al. 14 and Markunas et al. 16 but not by Richmond et al. 17 . Another AHRR-associated CpG site, cg17924476, although not reported in studies using cord blood or newborn blood, was detected in adolescents exposed to maternal smoking 15 . Moreover, multiple DM CpG sites, including two in MARCH3, two in FLYWCH1 and three in ACSM3 were not identified in previous studies, which suggests these to be novel targets affected by prenatal tobacco smoke exposure and warrants further studies for validation. We did not detect any variation consistent with the genome-wide significant criteria in CTCNAP2 cg25949550, a CpG site that was agreeable with previous studies (data not shown). This may be explained by the differences between populations, as our study was performed in an Asian population when others were conducted mostly in Caucasian populations. Moreover, evidence suggests that genetic background and genetic variants effect differences in DNA methylation at CpG sites associated with maternal smoking 19,26,27 . Although we did not investigate genetic variants in our study, we attempted to search for single-nucleotide polymorphism (SNP) sites in CTCNAP2 and several novel genes that may affect the reported CpG sites from the Infinium HD Methylation SNP List provided by Illumina. We then compared SNP sites between Japanese and Caucasian populations using the 1000 Genomes Browser (https://www.ncbi.nlm.nih.gov/variation/tools/1000genomes/). However, we detected a higher frequency of SNP sites that may affect the methylation rate at cg05150608    only in the intergenic region between ANKRD9 and RCOR1 (Supplementary Table S4). Nevertheless, a study comparing European and African populations found that genetic variances may explain as much as half of the diversity in DNA methylation between populations; the remaining variance is thought to be due to complex gene-gene and gene-environment interactions 22 .
Considering that the negative effects of prenatal tobacco smoke exposure can be diminished if the mother quits smoking in the first trimester of pregnancy, it is vital to determine the critical window of exposure. We conducted a secondary comparison to identify CpG sites that overlapped across the different primary comparisons, with the aim to focus on CpG sites that may be specific to the St-S group. Five of the nine CpG sites identified for unaltered DNA methylation patterns if the mothers stopped smoking early during pregnancy, namely AHRR cg05575921, AHRR cg21161138, CYP1A1 cg05549655, GFI1 cg12876356 and GFI1 cg09662411 were previously reported, as discussed above. Except for AHRR cg21161138, which showed a weak correlation with HM450K and no significant difference in methylation between groups in NGS, we successfully verified the changes in DNA methylation at AHRR cg05575921 and CYP1A1 cg05549655. Two CpG sites in GFI1 gene were not verified by NGS due to technical issues. Novakovic et al. performed a detailed investigation on AHRR methylation and found no significant differences between children from women who never smoked and those who stopped smoking after confirmation of their pregnancy 19 . Similarly, Joubert et al. reported that alterations in DNA methylation occurred in the foetus only if women smoked past the 18 th week of pregnancy 28 . The authors further indicated that changes in DNA methylation patterns result from maternal smoking, and not from paternal smoking before pregnancy, or from an epigenetic inheritance of smoking effects from the grandmother to the mother. Richmond et al. found no differences at AHRR cg05575921 and CYP1A1 cg05549655 between cord blood samples that were unexposed or exposed only during the first trimester 17 . The authors also reported the same finding in GFI1, although the CpG site identified in their study (cg09935388) was different from those identified in this study (cg12876356 and cg09662411). Our findings at AHRR cg05575921 were consistent with previously published data 19 , where hypomethylated state of CpG sites neighbouring cg05575921 was observed in the cord blood samples of mothers in the Su-S and Ne-S but not among those of mothers in the St-S. In adult smokers, stopping or reducing smoking was related to the recovery of DNA methylation status at AHRR in peripheral blood 29 . A very recent study examined AHRR and CYP1A1 methylation and mRNA expression using first trimester foetal livers and placentas; significant changes in AHRR methylation were only observed in placentas from female foetus 30 . However, significant increases in the mRNA of AHRR in livers and of CYP1A1 in placentas were detected, which suggests that smoking may have an early effect on gene expression before inducing noticeable alterations in DNA methylation 30 . Thus, the question regarding of the duration of maternal smoking exposure required to make remarkable and sustained changes in the DNA methylation level is still unclear and warrants further investigations. In total, together with other studies, our results further support AHRR as an exemplary gene for the interaction between environment and DNA methylation.
Amidst the reported genes associated with smoking, in particular maternal smoking, AHRR and CYP1A1 were thoroughly investigated. Both AHRR and CYP1A1 function downstream of AHR; while AHRR represses AHR, CYP1A1 was activated by AHR 19 . Consistent with previous studies, we found a hypomethylated status of  AHRR and a hypermethylated status of the CYP1A1 promoter [13][14][15][16][17][18][19] . Moreover, it has been shown that the hypomethylation at AHRR cg05575921 is correlated with an increase in AHRR expression 31 . Notably, alterations in the DNA methylation status of both AHRR and GFI1 in cord blood contributes to low birth weight of children of mothers in the Su-S group 14,32 . Specifically, hypomethylation cg05575921 was found to associate with low birth weight 33 . As discussed in Tian et al., AHRR cg05575921 hypomethylation was correlated with an increase in AHRR expression and several effects, including repression of oxygen transport-related embryonic globin genes, inhibition of cell growth and proliferation, and suppression of angiogenesis. Although genetic polymorphisms in CYP1A1 have been implicated to affect birth weight and birth size 1 , the contribution of CYP1A1 hypermethylation in cord blood has not been fully investigated. However, we did not observe significant change in birth weights between Ne-S, Su-S, and St-S. We can only assume that our relatively small sample size contributed to the absence of a low birth weight phenotype. In addition to being recognised as a risk factor for lower birth weight and birth size in newborns, intra utero tobacco smoke exposure has been shown to increase the risk of neurodevelopmental disorders, particularly autism spectrum disorder and attention deficit hyperactivity disorder (ADHD) 34 . This notion was recently addressed in a study investigating the association between DNA methylation in the peripheral blood of ADHD children and maternal smoking during pregnancy 32 . Despite the small size of the population studied, DNA methylation at AHRR and GFI1, but not at CYP1A1, were shown to be related to the severity of conduct disorder and ADHD in children of smoking mothers. Furthermore, maternal smoking has been shown to induce hypomethylation at GFI1 in the peripheral blood of children suffering from sudden infant death syndrome (SIDS) compared with healthy children 35 , suggesting a complex triad relationship among intra utero tobacco smoke exposure, GFI1 methylation status and SIDS. In addition to the previously known CpG sites, we identified four novel CpG sites, including ACSM3 cg0647882, TRIM36 cg07469926, and cg05150608 in the intergenic region between ANKRD9 and RCOR1, which represented a hypermethylated status in the Su-S group, and SHANK2 cg05780228, representing a hypomethylated status in the Su-S group. NGS analysis revealed similar changes in DNA methylation of the adjacent CpG sites in TRIM36 and SHANK2, suggesting that these sites are susceptible to maternal smoking. A recent meta-analysis suggested maternal smoking during pregnancy as a high-risk factor for neural tube defects 36 . TRIM36 is expressed in the developing brain and is thought to be indispensable for somite formation 37 ; lack of TRIM36 due to a homozygous mutation has been reported to result in autosomal recessive anencephaly 38 . Recently, TRIM36 hypermethylation has been linked to a cell-transformation effect of Benzo[a]pyrene, a polycyclic aromatic hydrocarbon found in tobacco smoke, in bronchial epithelial cells 39 . Hypermethylation status in TRIM36 was also identified in tumours of lung, breast, liver and ovarian cancer 39 . Although no study has focused on SHANK2 hypomethylation, mutation and copy number variation of SHANK2, encoding a member of synaptic scaffolding protein in the SHANK family was reported in patients with autism spectrum disorder and mental retardation 40 . ACSM3, formerly known as an acyl-CoA synthetase medium chain family member 3, is associated with metabolic syndrome 41 . Recently, down-regulation of ACSM3 has been implicated in the poor outcome of patients with hepatocellular carcinoma 42 . Like SHANK2, little is known about the epigenetic modification of ACSM3. The function of the CpG site, cg05150608, identified in the noncoding region between the ANKRD9 and RCOR1 genes is unknown. Taken together, our findings indicate changes in DNA methylation patterns of target genes, implicating prenatal tobacco smoke exposure as a risk factor for cancer, congenital abnormalities and neurodevelopmental disorders in the unborn child. However, our observations were limited to alterations in DNA methylation of cord blood, which may not represent methylation status in the foetal brain or other organs. Further studies are awaited to reinforce the relationship between tobacco smoke exposure and disease development as well as the role of novel target genes in this process.
Secondary comparison between the CpG sites identified in the Ne-S vs. Su-S comparison and those identified in the St-S vs. Su-S comparison pointed out EVC2 cg01290904 as the only common CpG site, which was verified by NGS analysis, suggesting that this CpG site was affected by short-term tobacco smoke exposure in the first trimester. Mutation in EVC2 causes congenital skeletal dysplasia, resulting in very short stature, known as Ellis-van Creveld syndrome 43 . Recent studies on Evc2 mutant mice further implicate the importance of EVC2 in the normal development of teeth and cranial bone 44,45 . Notably, children born to actively or passively smoking mothers show higher risk of hypodontia 46 . Moreover, a slightly increased risk of oral clefts has been reported in children born to actively smoking mothers 47 . These findings suggest a striking association between maternal smoking and EVC2. However, we could not find any previous study linking EVC2 to early maternal smoking. Future studies focusing on EVC2-related outcomes and smoking exposure during the first trimester are needed to validate whether EVC2 links maternal smoking with congenital anomalies in children.
The HM450K array allows access to >450,000 CpG sites across the genome that span 99% of the RefSeq genes and 96% of the CpG islands. Thus, HM450K array is a top choice for EWAS study. HM450K utilises the detection of DM CpG sites, providing insights into the affected genes. However, it is limited to the CpG sites included in the array and does not provide information on the neighbouring sites. Moreover, it is thought that the results of HM450K analysis may vary between different arrays (batch effect) or may not reflect bona fide DNA methylation due to problematic probes and intricacies in data normalisation and/or data analysis 23 . Therefore, with the availability of NGS in our laboratory, we chose to perform NGS analysis of DM CpG sites detected by HM450K. Using this approach helped us accomplish two objectives: to verify the changes in DNA methylation status of selected CpG sites identified using HM450K array and to assess the neighbouring CpG sites contained within the range of primers. The relatively strong correlation between results obtained with the two methods, together with the high-throughput capacity and single nucleotide resolution of NGS, favours the application of NGS for assessing DNA methylation of target gene(s) in large cohort studies. Moreover, with the increasing number of studies on DNA methylation as a biomarker for smoking 29,48 , especially maternal smoking, NGS may serve as a high-throughput and reliable approach for translational studies.
To the best of our knowledge, this is the first study to report the association of cord blood DNA methylation and prenatal cigarette smoking exposure in a Japanese population. Our study has several strengths. Firstly, all participants were native Japanese, living in Sapporo city and the surrounding areas, which were presumably homogenous in term of environment, culture, and daily life. Thus, the study population limited the effect of other environmental factors and was suitable for investigation of the interaction between a specific factor (prenatal smoking exposure) and the epigenome. Secondly, previously reported CpG sites were replicated in our studies, demonstrating the robustness of HM450K. Moreover, to overcome the drawbacks of HM450K, we applied bisulfite-NGS to validate our observations. Thirdly, by focusing on the windows of exposure with the exclusion of mothers who had stopped smoking before pregnancy, we demonstrated the effect of smoking on DNA methylation in the cord blood even if the mother stopped smoking as soon as the pregnancy was confirmed.
Although data were carefully analysed and verified, we are aware of the lack of a method of quantitative analysis of maternal smoking and the possible impact this may have on the results. In this study, mother-infant pairs were categorised based on self-reported information, without measuring any biomarkers for smoking (e.g. cotinine level). This approach might result in misclassification of maternal smoking groups. Nevertheless, the inaccuracy was presumed to be small, as a previous report on a Japanese population showed a high agreement between self-reported information and cotinine measurement 4 . Another limitation of this study might be that the effect of second-hand smoking was not accounted for in our analysis, which may lead to false unexposed status. However, this underestimation in smoking exposure would reduce the significance between Ne-S or St-S and Su-S groups. Thus, the CpG sites reported in this study are likely not overstated. Lastly, our results had shortcomings: replication in a second cohort and for the relatively small population studied, especially for Su-S and St-S groups (n = 46 and 77, respectively), which may reduce statistical strength and contribute to the absence of phenotypes, for instance, low birth weight, as discussed above. Therefore, the novel CpG sites should be interpreted with caution and must be validated in future cohort studies.
In conclusion, we reported novel CpG sites and verified previously established sites associated with maternal smoking and cessation of maternal smoking upon the confirmation of pregnancy. These sites were linked with SCIEntIFIC REpoRTS | (2018) 8:5654 | DOI:10.1038/s41598-018-23772-x adverse outcomes of intra utero tobacco smoke exposure, including recently reported effects on neurodevelopment disorders and congenital defects. However, the triad relation among maternal smoking, DNA methylation alterations and certain diseases remain unclear. To address this issue, future studies on the effects of maternal smoking on DNA methylation and disease development using large-scale cohorts are needed.

Study population.
We performed a prospective study using Sapporo birth cohort from previously described Hokkaido Study on Environment and Children's Health 49,50 . In this study, 514 women, who had enrolled at 23-35 weeks of gestation and delivered at the Sapporo Toho Hospital between 2002 and 2005 in Japan, agreed to participate. All participants were native Japanese and residents of Sapporo City or surrounding areas. Participants completed a questionnaire that asked for basic information, including their dietary habits, exposure to chemical compounds in their daily life, smoking history, alcohol consumption, caffeine intake, family income and educational levels of themselves and their partners. Of the 514 participants, 295 cord blood samples were obtained. Genomic DNA of sufficient quantity and quality was obtained from 292 participants.
Genome-wide DNA methylation analysis. Umbilical cord blood samples were taken immediately after birth and stored at −80 °C. Genomic DNA was extracted from cord blood using a Maxwell ® 16 DNA Purification Kit (Promega, Madison, WI, USA). Genome-wide DNA methylation analysis was performed using the Infinium HumanMethylation 450 BeadChip (HM450K) (Illumina, San Diego, CA, USA) following the manufacturer's instructions. All analyses were performed by G&G SCIENCE CO., LTD. (Matsukawa, Fukushima, Japan).
Methylation data were pre-processed using the R/Bioconductor package minfi 51 . From the 292 mother-infant pairs, one pair was rejected on the basis of poor quality control (QC) parameters. After QC, signal intensities were normalised using functional normalisation 52 . CpG probes containing single nucleotide pair-affected probes 53 , with a detection P-value of >0.05, sex chromosomes were removed 54 . As a result, 426,576 CpG probes were included in the working set. Inter-slide differences were adjusted using ComBat 55 . Methylation values of each CpG site were represented as β-values ranging from 0 to 1 and calculated from signal intensities prior to statistical analysis 56 .
Data analysis. The 291 mother-infant pairs were categorised into three exposure classes based on the smoking history of mothers: 124 Ne-S, 46 Su-S, 77 St-S, who stopped smoking after the confirmation of their pregnancy; 44 St-S were excluded from this study because of the unknown duration of non-smoking periods. Robust linear regression analyses 57 and empirical Bayesian methods 58 were applied to determine the associations of β-value at each CpG site, adjusted for maternal age, infant sex, maternal education and surrogate variables. In addition, cell type fraction (CD4 + T cells, CD8 + T cells, granulocytes, monocytes, B cell and nucleated red blood cells) for each subject were calculated using reference-free cell mixture adjustments 59 . Statistical analyses were performed using minfi, SVA, limma and RefFreeEWAS packages in R ver. 3.2.3 and Bioconductor ver. 3.2. Differential methylated sites were identified between Ne-S and Su-S, Ne-S and St-S and St-S and Su-S. To correct for multiple testing, P-values were adjusted for FDR based on the Benjamini and Hochberg method to obtain q-values. Partial regression coefficient (β) for the magnitude of the effect on DNA methylation change and selected CpGs with an FDR of <0.05 and |β| >0.02. On specific genes, statistical significance of differences in methylation values between groups was determined using Steel-Dwass test; differences with P-value of <0.05 were considered statistically significant. Associations between DNA methylation ratios at the seven CpG sites obtained from HM450K analysis and from NGS analysis were tested using Spearman's correlation coeffcients (ρ). Bisulfite next-generation sequencing. DNA was subjected to bisulfite conversion and amplified using FastStart Taq DNA Polymerase (Roche, Basel, Schweiz). PCR primers for bisulfite PCR were designed using MethPrimer (http://www.urogene.org/methprimer/) (Supplementary Table S5). For NGS, amplicon libraries were generated using an Ion Plus Fragment Library Kit (ThermoFisher Scientific, MA, USA) and Ion Xpress Barcode Adaptors Kit (ThermoFisher Scientific). Following Agencourt AMPure XP purification (Beckman Coulter, CA, USA), individual libraries were amplified using quantitative real-time PCR, diluted and pooled in equimolar ratios. The libraries were then processed with an Ion Chef System using an Ion PG Hi-Q Chef Kit (ThermoFisher Scientific). Sequencing was performed using an Ion PGM Hi-Q Sequencing Kit (ThermoFisher Scientific) and 850 flows on an Ion 318 Chip Kit v2 (ThermoFisher Scientific), according to the manufacturer's protocol. After sequencing, single processing and base calling were performed using Torrent Suite 5.0.2 (ThermoFisher Scientific). Methylation analysis was performed using MethylationAnalysis_Amplicon plug-in v1.3 (ThermoFisher Scientific). Extreme outliers in methylation data can have significant influences on results. Therefore, outliers were removed using the Tukey method 60 . Comparison of methylation ratio among the different smoking groups was performed using Steel-Dwass test. Differences with P-value of <0.05 were considered statistically significant.
Ethics. This study was conducted with informed consent of all subjects in writing. All procedures involving human subjects were approved by the University of Yamanashi, the Hokkaido University Graduate School of Medicine and the Hokkaido University Center for Environmental and Health Science and were performed in accordance with relevant guidelines and regulations. Data availability. The datasets generated and analysed during the current study are available from the corresponding author on request.