Distinct DNA methylation profiles in subtypes of orofacial cleft

Background Epigenetic data could help identify risk factors for orofacial clefts, either by revealing a causal role for epigenetic mechanisms in causing clefts or by capturing information about causal genetic or environmental factors. Given the evidence that different subtypes of orofacial cleft have distinct aetiologies, we explored whether children with different cleft subtypes showed distinct epigenetic profiles. Methods In whole-blood samples from 150 children from the Cleft Collective cohort study, we measured DNA methylation at over 450,000 sites on the genome. We then carried out epigenome-wide association studies (EWAS) to test the association between methylation at each site and cleft subtype (cleft lip only (CLO) n = 50; cleft palate only (CPO) n = 50; cleft lip and palate (CLP) n = 50). We also compared methylation in the blood to methylation in the lip or palate tissue using genome-wide data from the same 150 children and conducted an EWAS of CLO compared to CLP in lip tissue. Results We found four genomic regions in blood differentially methylated in CLO compared to CLP, 17 in CPO compared to CLP and 294 in CPO compared to CLO. Several regions mapped to genes that have previously been implicated in the development of orofacial clefts (for example, TBX1, COL11A2, HOXA2, PDGFRA), and over 250 associations were novel. Methylation in blood correlated with that in lip/palate at some regions. There were 14 regions differentially methylated in the lip tissue from children with CLO and CLP, with one region (near KIAA0415) showing up in both the blood and lip EWAS. Conclusions Our finding of distinct methylation profiles in different orofacial cleft (OFC) subtypes represents a promising first step in exploring the potential role of epigenetic modifications in the aetiology of OFCs and/or as clinically useful biomarkers of OFC subtypes. Electronic supplementary material The online version of this article (doi:10.1186/s13148-017-0362-2) contains supplementary material, which is available to authorized users.


Background
Orofacial clefts (OFCs) are a set of common birth defects that affect roughly 15 in every 10,000 births in Europe [1]. There are three main subtypes of OFC: cleft palate only (CPO), cleft lip only (CLO) and cleft lip with cleft palate (CLP) (Fig. 1). Non-syndromic cases, which comprise around 70% of cases of cleft lip with or without cleft palate, have a complex aetiology involving both genetic and environmental factors [2].
A child born with an OFC may face difficulties with feeding, speech, dental development, hearing and social adjustment. At considerable health, emotional and financial costs, they undergo surgery in the first year of life and many need additional surgical procedures later in life. They may experience low self-esteem, psychosocial problems and poor educational attainment, and the condition can harm the emotional wellbeing of the whole family [2][3][4].
In 2012, the James Lind Alliance identified the top 10 priorities in OFC research, which includes (1) identifying the genetic and environmental causes of OFCs and (2) identifying strategies to improve diagnosis of CPO. Epigenetic data might help to address both of these.
Firstly, given the key role of epigenetic processes such as DNA methylation in embryonic development, we and others have hypothesised that aberrant epigenetic mechanisms might play a role in causing OFCs [2,5,6]. This hypothesis has been supported by data suggesting an important role for DNA methylation and other epigenetic processes in regulating normal orofacial development and OFCs in mice [7][8][9][10][11][12], but published epigenetic data for OFCs in humans is lacking. The three major subtypes of OFC (CPO, CLO, CLP) appear to be aetiologically distinct, for example, lip and palate formation occur at different times during embryogenesis, and there is a higher risk of familial recurrence of the same subtype compared with risk of recurrence of a different subtype [13,14]. Therefore, if epigenetic mechanisms play a causal role in OFC aetiology, the precise role may differ by subtype. Secondly, regardless of whether epigenetics plays any causal role in OFC development, epigenetic data 'captures' information about the underlying genetic architecture and historical prenatal environmental exposures and could therefore be a useful measure of genetic and prenatal environmental influences that do cause OFCs. Again, we might expect these epigenetic indicators to differ by subtype, reflecting differential influence by different risk factors. Thirdly, if epigenetic profiles differ by OFC subtype, epigenetic measures could be developed into a biomarker to improve diagnosis of OFCs, either pre-or postnatally. This would be particularly useful for diagnosing CPO, which is often undetected on ultrasound and can go undiagnosed after birth, resulting in impaired feeding and growth, poorer outcomes and distress for families [15].
Additionally, epigenetic data could be useful in studying later-life outcomes associated with OFCs. For example, DNA methylation in cord blood has been associated with childhood IQ [16], so future studies could explore whether methylation mediates reported associations between OFCs and poor educational attainment [4].
Alternatively, even in the absence of a mechanistic role for epigenetic processes, epigenetic data might predict later-life outcomes caused by genetic and/or environmental factors.
As a first step towards exploring the potential role of epigenetic modifications in either causing or predicting various OFC phenotypes and downstream outcomes, we were interested in whether children with different subtypes of OFCs have distinct DNA methylation profiles. In the Cleft Collective birth cohort study, we studied DNA methylation in whole blood and lip/palate tissue from non-syndromic children with CLO, CPO and CLP.

Participants
Participants were children from the United Kingdom enrolled in the Cleft Collective birth cohort study between 2013 and 2016 [17,18]. Families of a child with an OFC were invited to take part soon after the child was born. Demographic and lifestyle information for both parents was collected via questionnaire. Blood and discarded lip/ palate samples were collected at time of surgery to repair the OFC. Additional details on the surgery and OFC were collected on a surgical form. For the purposes of the current study, a sample of 150 believed-to-be nonsyndromic children (with no other known anomalies) was randomly selected. The sample was stratified by OFC subtype (CLO, CPO, CLP) resulting in 50 children per group.

Classification of OFC
Details on the cleft phenotype were collected from surgical forms completed at the time of operation and from parental questionnaires. Surgeons recorded the phenotype using either the LAHSAL or LAHSHAL classification [19], which was condensed to CPO, CLP or CLO for the purposes of this study. Parents used this simplified classification of subtype (CPO, CLP or CLO). Where data were . Further subtyping can be made according to laterality and whether the soft and/or hard palate is affected. The dark bars represent the cleft available from both sources, we compared the reported subtype and found no discrepancies.

Other variables
The child's age at biological sample collection was calculated from the child's date of birth to the date of surgery. We were interested in whether children with different OFC subtypes have different rates of development, so we also predicted the child's 'epigenetic age' using blood methylation data and the method developed by Horvath [20] (discussed in more details in Additional file 1). ' Age acceleration' was calculated as the residuals from a linear regression of epigenetic age on actual age at sample collection. A positive value corresponds to an individual whose epigenetic age is ahead of their actual age, and viceversa. Sex was initially assumed by staff at the Cleft Collective using the child's name and later confirmed by parental questionnaires, where available, and NHS Digital data if explicit consent was held. Mothers self-reported how much they smoked around the time of conception, and this was classified for the purposes of this study as any or no smoking around conception. Additionally, a score to predict in utero/early-life smoke exposure was calculated from the child's blood DNA methylation data. The score was calculated as previously described [21] using a weighted sum of methylation beta values at 26 maternal-smoking-associated methylation sites identified in cord blood [22]. The efficacy of this score for predicting maternal smoking is discussed in Additional file 1. Information on the mother's occupation was dichotomised as either non-manual skilled work or manual/unskilled/no work. Information on the mother's education was dichotomised as achieving a university degree/above or not achieving a university degree. Information on parity was dichotomised for this study as no previous children or one or more previous children. Maternal and paternal age in years were reported by the mother and treated as continuous variables. Maternal and paternal ethnicity were reported by the mother and used to deduce child ethnicity as white or other. For each model, surrogate variables were generated using the sva package [23,24] in R [25] to capture residual variation associated with technical batch and cellular heterogeneity. The number of surrogate variables (10) was estimated by the sva algorithm using the methylation data and the model matrices. Blood cell type proportions were also estimated using the Houseman method [26,27] for use in a sensitivity analysis (there is no appropriate reference panel to use the Houseman method to estimate cell type proportions for the lip/palate tissue).

DNA methylation
Blood and either lip or palate tissue samples were available for each of the 150 children in this study. The orofacial tissue type was dependent on the OFC subtype; therefore, lip samples were available for children with CLO and palate samples for children with CPO. Of the 50 children with CLP, 43 contributed a lip sample and just seven contributed a palate sample. To allow us to make pairwise comparisons between all three subtypes using the same tissue type, we carried out our main analyses on blood samples.
Upon arrival at the Bristol Bioresource Laboratories (BBL), whole-blood samples were immediately separated by centrifugation into white blood cell and plasma aliquots before storage at −80°C. Lip/palate tissue samples were stored at −80°C in RNAlater. DNA from white blood cells and tissue samples was extracted, and genome-wide DNA methylation was measured using the Illumina Infinium HumanMethylation450 BeadChip platform. Blood and tissue samples were randomised over different batches. Data were pre-processed in R version 3.3.2 with the meffil package [28]. Functional normalisation [29] was performed in an attempt to reduce the non-biological differences between probes. Blood and tissue samples were normalised together. Of the original 150 blood samples, three failed quality control due to a mismatch between reported and methylation-predicted sex (and additional data from NHS Digital or parental questionnaire was not available to cross check). Of the original 150 lip/palate tissue samples, two lip samples failed quality control due to assay failure.
Epigenome-wide association studies (EWAS) were conducted using data from the blood as described below. For these studies, we removed 944 probes that failed quality control in meffil and a further 1058 probes that had a detection P value >0.05 for >5% of samples. Finally, we removed 11,648 probes mapping to the X or Y chromosomes and 65 SNP probes included on the array for quality control purposes. This left 472,792 probes in the dataset for the EWAS. Extreme outliers in the blood and tissue methylation data were identified using the Tukey method (<1st quartile−3 × IQR; >3rd quartile+3 × IQR) and set as missing. The median number of samples removed per probe was 0 (IQR 0 to 1; range 0 to 72). Methylation data were reported as beta values, ranging from 0 (completely unmethylated) to 1 (completely methylated).

Statistical analysis
We assessed the association between parental and child characteristics and cleft subtype using chi-squared or t tests. We also used linear regression to explore whether 'epigenetic age' (age predicted using the blood methylation data) differed from true age at sampling and whether any deviation (age acceleration) was associated with OFC subtype or any parental characteristics.
Epigenome-wide association studies (EWAS) were conducted in R version 3.3.2 [25]. For our main EWAS analyses, we used linear regression to model cleft subtype as the exposure and untransformed blood methylation beta values as the outcome. To identify blood methylation profiles specific to each subtype, we made three pairwise comparisons: CPO compared to CLP (CPOvsCLP), CLO compared to CLP (CLOvsCLP), CPO compared to CLO (CPOvsCLO). All models were adjusted for sex because previous studies have found different sex ratios for OFC subtypes. In order to adjust for technical batch effects and cellular heterogeneity, we calculated surrogate variables and included these in all models. We also conducted a sensitivity analysis using chip ID to adjust for batch and Houseman-estimated cell proportions to adjust for cellular heterogeneity. For the results from the CPOvsCLO and the CPOvsCLP EWAS analyses, we removed age-related CpGs as described below. P values were corrected for multiple testing using the Bonferroni method and a threshold of 0.05, i.e. an uncorrected P value threshold of 1 × 10 −7 . Regression coefficients are interpreted as the difference in mean methylation beta value in children with one subtype compared to children with another subtype.
In addition to the EWAS analyses at individual CpGs, we also used Comb-P [30] to detect differential methylation across larger regions of the genome. This approach is statistically more powerful and has been associated with a lower rate of false positive findings compared to EWAS at individual CpGs [31]. Using genomic location and P values from our individual CpG EWAS results, Comb-P identifies regions that are enriched for low P values. It then calculates and adjusts for auto-correlation between those P values using the Stouffer-Liptak-Kechris correction and performs Sidak correction for multiple testing. Differentially methylated regions (DMRs) were defined as regions fulfilling these criteria: (1) contains at least two probes, (2) all probes within the region are within 1000 base pairs of at least one other probe in the region, and (3) the Sidak-corrected P value for the region is <0.05.

Special consideration of age at sampling
Biological samples were collected at first surgery, which is typically around 3-6 months after birth for lip repair and 6-18 months after birth for palate repair. Therefore, we anticipated that the children with CLO and CLP would be younger than the children with CPO. Previous studies have shown that age, particularly during this early developmental period, is strongly associated with methylation [32][33][34]. We refrained from adjusting for age at sampling because it is not a true confounder (it cannot plausibly cause OFC subtype). Instead, we considered it a nuisance variable and dealt with it by 'filtering out' any age at sampling-related CpGs from our main analysis. To do this, using all the participants in our sample, we ran an EWAS of age at time of blood sampling, and for any age-associated CpGs (uncorrected P value <0.05), we set the EWAS P values from the CPOvsCLP and the CPOvsCLO analyses to 1. This meant that we were filtering out age-related CpGs from our main EWAS results while maintaining the same multiple testing burden and array structure for the region-based analysis. In the age-at-sampling EWAS, child's age in months was modelled as the exposure with methylation as the outcome. The model was adjusted for 10 surrogate variables for technical batch and cellular heterogeneity. We confirmed that our age-at-sampling EWAS was effectively identifying age-related CpGs (independent of OFC subtype) by inspecting heterogeneity statistics from a meta-analysis of three separate age-atsampling EWASs run within each OFC group (more details in Additional file 1).

Functional analysis
To explore the function of any OFC-associated DMRs, we used the missMethyl [35] R package to test for enrichment of any gene ontology (GO) classification terms or Kyoto Encyclopaedia of Genes and Genomes (KEGG) pathways. This method corrects for biases in the genomic coverage of the Illumina Infinium HumanMethylation450 BeadChip array. We also looked up gene annotations from our DMRs in recently curated lists of OFC-related genes from (1) the DisGeNET database of diseases and related genes from human, rat and mouse studies [36] and (2) a bioinformatics study of OFC-related genes in human and animal studies published by Funato et al. [37].

Comparison to DNA methylation in lip/palate tissue
We postulated that DNA methylation in tissue at the OFC site might most closely represent DNA methylation in the developing orofacial tissues; therefore, we were interested in the correlation between DNA methylation in the blood and in tissue at the site of the OFC at our top DMRs. We calculated within-subjects blood-tissue Spearman's rho correlation coefficients for each CpG in each DMR. Correlation was assessed before and after the methylation data were adjusted for the top ten principal components (to capture cellular heterogeneity and technical batch). To calculate principal components, the methylation data were first split into separate datasets for blood, lip and palate samples and the R function prcomp was applied to the 10,000 most variable probes. The methylation datasets were then adjusted for their principal components using the Limma package [38] in R. Additionally, we conducted an EWAS (in the same way as described above in blood samples) to compare CLO to CLP in lip samples. We did not conduct an EWAS of CPOvsCLP in palate tissue due to only having seven palate samples for children with CLP. We did not conduct an EWAS of CPOvsCLO in tissue because it would be impossible to disentangle differences due to OFC subtype from differences due to tissue.

Characteristics of participants
From 150 participants selected for the study, 147 passed quality control for blood and methylation data and 146 had information on all variables in the main model (subtype, age at sampling, sex). Participant characteristics are summarised in Table 1. As we expected, children with CPO were on average 4 months older than children with CLO (t test P = 1.3 × 10 −19 ) and 6 months older than children with CLP (t test P = 35.8 × 10 −14 ) because of the timing of the surgery for the lip and palate repair (Fig. 2). Accordingly, the same pattern was seen for epigenetic age, despite the weak correlation with actual age (Spearman's rho for all participants 0.7; CPO 0.5, CLO 0.3, CLP 0.3). Participants with CLP tended to have a higher epigenetic age than their actual age (mean residual 0.8 months) whereas participants with CLO or CPO tended to have a lower epigenetic age than their actual age (mean residual −0.3 and −0.5, respectively). However, the confidence intervals crossed the null and t test P values for differences between subtypes were large (ranging 0.1 to 0.9). Age acceleration was also not associated with any measured confounder (Additional file 1). Participants with CLP were more likely to be male than participants with CLO (chi-squared P = 0.02) or CPO (chi-squared P = 0.002), but there was no difference in the sex ratio between participants with CLO and CPO (chi-squared P = 0.60). According to maternal self-report of smoking behaviour, participants with CPO were more likely to have mothers who smoked around the time of conception compared to participants with CLP and CLO (chisquared P value = 0.07). It should be noted that there was a particularly high level of missing data for this variable (70%). A tobacco exposure score calculated from the blood DNA methylation data was not associated with cleft subtype (chi-squared P = 0.361).

Individual CpG epigenome-wide study in blood
After the Bonferroni correction for multiple testing, there were no CpGs where blood DNA methylation was associated with either CLO (n = 48) compared to CLP (n = 49) or CPO (n = 49) compared to CLP (P > 1 × 10 −7 ). In contrast, 335 CpGs were associated with CPO compared to CLO (Additional file 2, Table S1). Sensitivity analyses adjusting for chip ID and Houseman-estimated cell types (instead of surrogate variables) did not yield substantially different results (Additional file 1). We considered that some of these associations might be better explained by differences in age than OFC subtype, so we compared the results to those of our EWAS of age at sampling (described above). There were 29,984 CpGs associated with age at sampling with an uncorrected P value <0.05 (N participants in analysis 139; Additional file 2 Table S2). Confidence that these CpGs are truly associated with age (independently of OFC subtype) comes from our observation of low heterogeneity when we meta-analysed three separate age-at-sampling EWAS run within each OFC group (more details in Additional file 1) and the fact that many of these CpGs have previously been shown to be differentially methylated with age in infancy [33]. Of the 29,984 age-associated CpGs, 214 were also associated with CPO compared to CLO with P < 1 × 10 −7 . When we 'filtered out' the 29,984 age-related CpGs by setting the P values to 1 in the CPOvsCLP and CPOvsCLO results, 121 CpGs were associated with CPO compared to CLO (P value <1 × 10 −7 ; Table 2; Additional file 2 Table S1). All subsequent analyses were performed on these results, that is, CLOvsCLP without filtering age-associated CpGs, and CPOvsCLP and CPOvsCLO with age-associated CpGs filtered. Full  Fig. 3.

Differentially methylated region analysis of blood EWAS results
When we interrogated differential methylation over larger regions of the genome, we found four DMRs in CLO compared to CLP, 17 in CPO compared to CLP and 294 in CPO compared to CLO (Sidak-corrected P value <0.05; Additional file 2 Table S3). The top DMRs with Sidak P values <0.05 and the largest regression coefficients (taken from the single-site EWAS) are presented in Table 3. Boxplots of methylation levels averaged over the top DMRs for each subtype comparison are shown in Fig. 4. None of the 25 CpGs in the four CLOvsCLP DMRs overlapped with CpGs in DMRs from the other two comparisons. Of the 82 CpGs in the 17 CPOvsCLP DMRs, 39 (48%) were also in the list of 1063 CpGs in the 294 CPOvsCLO DMRs (Fig. 5). CPO was associated with higher methylation relative to CLP or CLO at 18 of these 39 CpGs and lower methylation relative to CLP or CLO at the remaining 21/39 CpGs.

Functional analysis of DMRs
There was no enrichment (FDR-adjusted P value <0.05) for any functional categories defined using GO terms or KEGG pathways in CpGs within CLOvsCLP DMRs or CPOvsCLP DMRs. CpGs in CPOvsCLO DMRs were also not enriched for any GO terms, but were enriched for 66 KEGG pathways. However, these were mostly broad (for example, pathways in cancer, MicroRNAs in cancer, fatty acid metabolism). The top five KEGG and GO terms with the smallest P values for each list of CpGs in DMRs are presented in Additional file 2 Table S4.
According to the DisGeNET database, there were 286 genes associated with OFCs, of which 93 were also identified in a recent bioinformatics study [37] that found 357 unique OFC-related genes (total number of OFCrelated genes from the literature, 643; Additional file 2 Table S5). Of these, eight mapped to regions that were differentially methylated in CPO compared to CLO, and two mapped to regions that were differentially methylated in CPO compared to CLP in our study. These genes are detailed in Table 4.

Comparison to DNA methylation in lip/palate tissue
We were interested in the correlation between DNA methylation in the blood and in the tissue at the site of the OFC at our top DMRs. The within-subjects correlation between DNA methylation in the blood and in either lip or palate tissue was higher at CpGs in DMRs than CpGs in other regions of the genome (Table 5; Mann-whitney U P values for difference in median rhos, blood vs lip 1.6 × 10 −34 , blood vs palate 5.5 × 10 −18 ).
In an EWAS of CLO (n = 48) compared to CLP (n = 43) in lip tissue (the only EWAS we were able to conduct in tissue samples), there was no single CpG with a Bonferroni-adjusted P value <0.05 (P > 1 × 10 −7 ). However, there were 14 DMRs containing a total of 77 CpGs (Additional file 2, Table S3). This included one intergenic region near KIAA0415 that was around 4% more methylated in children with CLO compared to children with CLP in both the blood and lip tissue. The top DMRs with Sidak P values <0.05 and the largest regression coefficients (taken from the single-site EWAS) are presented in Table 3.

Discussion
In this, the first study epigenetic epidemiological study of OFCs, we found multiple genomic regions differentially methylated in blood samples from non-syndromic children with CLO, CLP and CPO. Many more regions were differentially methylated between CPO and CLO than between CPO and CLP, and more regions were differentially methylated between CPO and CLP than between CLO and CLP. This suggests that all three subtypes have distinct DNA methylation profiles, but the DNA methylation profiles of CLO and CLP are more similar to each other than the DNA methylation profile of CPO. This has important implications for OFC research, reminding us that CLO, CLP and CPO should be analysed separately and not combined into a single entity or CL/P for analysis, as is sometimes the case in epidemiological and genetics studies.
Ideally, we would have compared methylation profiles in children with an OFC to those of children of a similar age without any congenital anomalies. However, the Cleft Collective cohort is a case-only cohort, so an appropriate control group was not available. We explored several options for controls from other cohorts, including from publicly available data, but did not identify any options that would not have introduced significant confounding/bias by factors such as technical batch, age, tissue or population. In fact, when we attempted to compare our children with OFCs to blood DNA methylation from children in two publicly available datasets (Gene Expression Omnibus accession numbers: GSE62219 [33] and GSE67444 [39]), we found substantial inflation and tens of thousands of differentially methylated CpGs, indicating insurmountable confounding (methods and results described in more detail in Additional file 1). Despite the lack of controls, our finding of distinct DNA methylation profiles in subtypes of OFCs is an important first step and highlights the need for more studies to explore the potential role of epigenetic modifications in either causing or predicting different types of OFC. Firstly, the subtypes might have distinct aetiologies in which DNA methylation plays a mechanistic role, i.e. the subtypes are caused, in part, by differences in DNA methylation. For this to be the case, blood DNA methylation at the time of sampling (up to 20 months after birth) would have to closely reflect DNA methylation in the developing orofacial tissues during embryogenesis. Although this is plausible, particularly at metastable epialleles where variable DNA methylation is established early in development before cellular differentiation, we did not have sufficient data to explore this possibility. For obvious ethical reasons, it is not possible to study embryonic tissues in humans; however, we were able to study methylation levels in lip and palate tissue collected postnatally at the time of surgery. We postulated that postnatal orofacial tissues might reflect embryonic orofacial tissues more closely than blood. We calculated the within-subjects correlation at CpGs in DMRs identified in the blood EWAS. On average, correlations between blood and lip/palate were higher in regions that were associated with OFC subtype than regions that were not. This suggests that methylation at some of the sites we identified as being differentially methylated in blood correspond to methylation in the postnatal orofacial tissues, which is arguably a more relevant tissue in which to study the role of methylation in causing OFCs. However, we note that the huge changes in tissue structure and function during development mean that our assumption that postnatal orofacial methylation might accurately reflect embryonic methylation could be incorrect. Secondly, the subtypes might have distinct aetiologies explained by genetic and/or environmental factors that also influence blood DNA methylation. That is, the association between OFC subtype and DNA methylation is confounded by genotype or prenatal environmental factors such as maternal smoking or obesity. Correlations between blood and lip/palate methylation could be explained by genetic and/or environmental factors that affect methylation in disparate tissues in the same way. Regardless of whether or not DNA methylation plays a mechanistic role, our finding of distinct blood DNA methylation profiles between subtypes suggests that DNA methylation could potentially be developed into a useful biomarker for different OFC subtypes. Such a biomarker could be used to predict OFC subtype in studies where data is missing, poor or requires cross-validation. It could also be used clinically to improve diagnosis of OFCs and thereby reduce the rate of poor outcomes associated with late diagnosis of CPO. Methylation is on a continuous scale and might therefore be able to capture phenotypes such as submucosal OFCs more accurately than current clinical classifications. Further work is warranted to explore whether OFC-associated DNA methylation in infancy is associated with later surgical, health, developmental and psychological outcomes. If so, a DNA methylation-based biomarker for OFC subtype might be useful in identifying individuals who may develop poorer outcomes and benefit from more intensive monitoring and/or therapy.
Thirdly, since OFCs form early in embryonic development and blood DNA methylation was measured in infancy, it is possible that the OFC subtype could   indirectly influence blood DNA methylation, that is, any association between OFC subtype and DNA methylation could be explained by reverse causation. For (hypothetical) example, children with CPO or CLP might have more difficulty feeding compared to children with CLO and the subsequent different nutritional exposure may cause differences in DNA methylation between children with these subtypes. If DNA methylation in infancy is influenced by OFC subtype and/or severity, then methylation could be a useful predictor of downstream OFC outcomes (as discussed above). The Cleft Collective is currently collecting cord blood samples, which are unaffected by postnatal environmental factors, and will therefore help us explore the direction of any causal effect between DNA methylation and OFC subtypes. Further work is warranted to explore our findings in a causal analysis framework. However, several of our DMRs map to genes that have previously been associated with OFCs, which provides some support that DNA methylation at these genes either plays a causal role in development of OFC subtypes or reflects different genetic or environmental factors that do. For example, we identified six CpGs in a region on the gene body of TBX1 that were 3 to 8% more highly methylated in blood DNA from children with CPO compared to CLO. TBX1 encodes the T-box transcription factor 1 and deletion of this region causes chromosome 22q11.2 deletion syndrome, characterised by, amongst other malformations, cleft palate [40]. Genetic variants at TBX1 have also been associated with non-syndromic CL/P [41]. Tbx1 is expressed on the palatal shelves in mice and deletion results in abnormal epithelial fusion [42]. We also identified a region of 15 CpGs on the gene body of COL11A2 that were around 2% more highly methylated in CPO than CLO. COL11A2 is one of three distinct genes that encode collagen XI, which is expressed in the developing jaw in rats [43]. Genetic variants in COL11A2 can cause syndromic and non-syndromic palatal defects [44,45].
Amongst our identified DMRs, there were some additional gene-OFC relations that have previously been reported in the literature, but that were not included in either the DisGeNET database or the recent review of OFC genes by Funato et al. [37]. For example, rare and/or common variants have been associated with non-syndromic CL/P at regions that were differentially methylated in CPOvsCLO: FZD1 (hypermethylated) [46], VAX2 (hypermethylated) [47] and FGF12 (hypomethylated) [48]. We also found that children with CPO had lower methylation than children with CLO at a region of five CpGs near MKNK2, which has very recently been associated with non-syndromic CL/P in central Europeans [49]. However this DMR did not survive correction for multiple testing (Sidak-corrected P = 0.2). Our finding of DMRs near OFC-implicated genes is consistent with the hypothesis that these loci play an important part in OFC aetiology, with two possible explanations for the observed associations: (1) they are explained by the underlying genetic architecture (and some of the children in our study may have undiagnosed syndromes caused by these genes); (2) they are explained by nongenetic variation in DNA methylation that had a causal effect on the development of the orofacial region and is also detectable in infant blood. Either way, these observations corroborate that perturbation of gene function at these loci is important in causing OFCs.
We also found over 250 novel genomic regions associated with different OFC subtypes, including four and 14 regions differentially methylated in CLO compared to CLP in blood and lip tissue, respectively. One region, near KIAA0415, was differentially methylated in both the blood and lip, with strikingly similar regression coefficients at the seven CpGs in the region (median for the blood 0.045 [IQR 0.039, 0.051], median for the lip 0.042 [IQR 0.031, 0.043]). This is perhaps more indicative of an underlying genetic effect rather than a direct association with methylation. Few genes have previously been implicated in CLO, because most studies have not considered it as molecularly distinct from CLP.
Of the novel genes associated with CPOvsCLO or CPOvsCLP in the blood, we have selected a few that could be related to OFCs via a biologically plausible mechanism. For example, we found a region of six CpGs near MIRLET7A3 that was 8% more highly methylated in CPO compared to CLO and 5% more highly methylated in CPO compared to CLP. MIRLET7A3 encodes a microRNA precursor, and although the mechanistic role of microRNAs in human OFCs has not been fully explored, there is some evidence from mouse studies that they could be important [50]. Furthermore, a recent microarray study found that has-let-7a-5p, which is the mature sequence of the MIRLET7A3-encoded precursor, was overexpressed in plasma samples from non-syndromic children with CPO and CLP relative to unaffected controls [51]. In our CPOvsCLO comparison, we also found several novel DMRs mapping to genes that have previously been linked to neural tube closure and/or defects (NTDs), for example RGMA [52], ARHGEF1 [53] and NODAL [54], as well as two genes that have been linked to both OFCs and NTDs, CCL2 [55] and PDGFRA [56]. This is particularly interesting, because NTDs and OFCs appear to share some aetiological features: they both occur when tissues in the midline fail to fuse completely during embryonic development [57]; they co-occur in the same individuals and in related individuals more than would be expected by chance [58]; they share several environmental risk factors [2,3,59,60]. Our findings further support recent evidence of an overlap in the molecular networks associated with OFCs and NTDs [61].
Finally, two of the DMRs we identified have previously been found in association with maternal risk factors for OFCs. A region of four CpGs at HIF3A was between 3 and 15% more highly methylated in children with CPO than in children with CLO. Methylation in this region has previously been associated with measures of adiposity, most commonly body mass index (BMI) [62]. A previous study found a positive association between maternal BMI and offspring cord blood DNA methylation at the four CpGs in our HIF3A DMR [63]. Additionally, a region of two CpGs at PRPH was 5% more highly methylated in children with CPO than in children with CLO. Methylation at three (different) CpGs at PRPH has previously been negatively associated with maternal plasma folate levels [64]. These findings might indicate distinct aetiologies with different risk factors, suggesting that CPO and CLO are differentially influenced by maternal adiposity and/or maternal folate levels.
Although OFCs are one of the most common birth defects, they are relatively rare, so collecting data on large numbers of affected individuals is challenging [17]. We used data and samples collected as part of the Cleft Collective cohort study, which is a unique and valuable resource for OFC research. The prospective nature of this cohort means that future work can assess whether the subtype-associated methylation we see in infancy persists to later ages and is associated with longer term adverse outcomes of OFC such as poor educational attainment. Partly due to the novelty of these data and this resource, we were unable to find an independent cohort with similar data to replicate our findings. We hope to generate DNA methylation data for a larger sample of Cleft Collective children and test for replication in future studies. We also hope to use a larger sample to develop and explore the utility of a biomarker to predict OFC subtype.
Another potential limitation of our study is that children with CPO were on average 6 to 7 months older than children with CLO and CLP, which had a large influence on the results of the EWAS comparing these subtypes. Although we believe we were largely successful in our attempt to remove this influence by filtering out age-related CpGs using a very liberal P value threshold of uncorrected P < 0.05, there may be some residual influence. For example, three of the top 25 CpGs where there is most evidence of differential methylation between CPO and CLO (Table 2) map to two genes that have previously been reported as associated with gestational age and/or age in infancy: NFIX [32,33,65] and SNED1 [33]. However, a previous microarray study of lip tissue found lower expression of NFIX in children with CLP compared to children with CLO even though both groups were sampled at 4-months-old, which provides some evidence that NFIX may be associated with OFCs independently of age [66]. Differences in the surgical protocol for lip and palate repair mean that this limitation (of age differences between children with CPO and CL/P) is likely to be present in other studies of OFCs where samples are collected at surgery, so techniques such as the one described in this paper should be developed to attempt to overcome this. We found no evidence of association between epigenetic age acceleration and OFC subtypes. Previous studies have postulated that epigenetic age acceleration is a measure of development in children [67,68], with a positive value indicating a child who is developmentally advanced for their actual age. Therefore, our finding of no association suggests that children with different OFC subtypes have similar rates of development.
The Cleft Collective cohort is still in the recruitment stage, and genotype and gene expression data do not yet exist for the participants. This means that we were not able to infer causality between OFC subtypes and blood DNA methylation using the Mendelian randomization [69,70] or explore functionality by calculating correlations between methylation and expression. This is something we hope to do in further studies.
There was a high proportion of missing demographic data (for example, on maternal smoking, education, occupation, ethnicity and parity), which is also related to the Cleft Collective cohort being in its infancy. Participants selected for this study were recruited near the start of the recruitment phase when questionnaire return rates were lower. Our return rates have increased recently and in future work, we hope to generate methylation data for a larger sample of the cohort with more complete questionnaire data. In future studies, we hope to have sufficient data to explore more potential confounders of the methylation-subtype associations.
Finally, as mentioned above, we were unable to make comparisons with unaffected children because the Cleft Collective is a case-only cohort and we could not identify an appropriate control group that would not have introduced substantial confounding. When genotype data are available, future studies using the Mendelian randomization will be able to circumvent these issues with confounding. Additionally, the Cleft Collective is collecting data on unaffected siblings, who we hope will act as a good control sample in future studies.

Conclusions
In conclusion, we found several genomic regions differentially methylated in blood and lip samples from nonsyndromic children with CLO, CLP and CPO. Confidence in our results comes from the fact that some of these genes have been previously linked to OFCs, but we have also highlighted many novel regions. Our findings represent a promising first step in exploring the potential role of epigenetic modifications in the aetiology of OFCs and/or as clinically useful biomarkers of different types of OFC.

Additional files
Additional file 1: Supplementary methods and results. (DOCX 119 kb) Additional file 2: Additional tables. Table S1: EWAS results for CpGs differentially methylated in the CPOvsCLO comparison with Bonferroni P value <0.05. Table S2: EWAS results for CpGs differentially methylated in association with age at sampling with an uncorrected P value <0.05. Table S3: Differentially methylated regions (DMRs).