Genome-wide survey of parent-of-origin effects on DNA methylation identifies candidate imprinted loci in humans

Abstract Genomic imprinting is an epigenetic mechanism leading to parent-of-origin silencing of alleles. So far, the precise number of imprinted regions in humans is uncertain. In this study, we leveraged genome-wide DNA methylation in whole blood measured longitudinally at three time points (birth, childhood and adolescence) and genome-wide association studies (GWAS) data in 740 mother–child duos from the Avon Longitudinal Study of parents and children to identify candidate imprinted loci. We reasoned that cis-meQTLs at genomic regions that were imprinted would show strong evidence of parent-of-origin associations with DNA methylation, enabling the detection of imprinted regions. Using this approach, we identified genome-wide significant cis-meQTLs that exhibited parent-of-origin effects (POEs) at 82 loci, 34 novel and 48 regions previously implicated in imprinting (3.7−10<P < 10−300). Using an independent dataset from the Brisbane Systems Genetic Study, we replicated 76 out of the 82 identified loci. POEs were remarkably consistent across time points and were so strong at some loci that methylation levels enabled good discrimination of parental transmissions at these and surrounding genomic regions. The implication is that parental allelic transmissions could be modelled at many imprinted (and linked) loci in GWAS of unrelated individuals given a combination of genetic and methylation data. Novel regions showing parent of origin effects on methylation will require replication using a different technology and further functional experiments to confirm that such effects arise through a genomic imprinting mechanism.


Background
Genomic imprinting is an epigenetic mechanism in which genes are silenced in a parent-of-origin specific manner. The first experimental evidence for genomic imprinting was provided by investigations during the 1980s when researchers failed to produce viable mouse embryos using only the paternal or maternal genome (1). The precise evolutionary mechanisms that give rise to genomic imprinting are unknown. One hypothesis postulates that imprinting provides a mechanism through which maternal and paternal genomes exert counteracting growth effects during development with paternal genes encouraging growth and solicitation of maternal care, even at the expense of the mother's health, while maternal alleles are oriented toward success of all offspring, who do not necessarily share the same father (2). There is some empirical evidence to support this hypothesis. For example, in contrast to expression of the paternally derived insulin-like growth factor 2 (IGF2) gene that promotes cell proliferation, expression of the maternally derived CDKN1C and PHLDA2 genes act as negative regulators of this process (3).
It is widely accepted that imprinted genes are regulated by cis-acting regulatory elements, called imprinting control elements, which carry parental-specific epigenetic modifications such as DNA methylation (4). DNA methylation mainly occurs at the C5 position of CpG dinucleotides and is known to influence transcription (4). Promoter regions of imprinted genes are usually rich in CpG sites and within differentially methylated regions (DMRs) where the repressed allele is methylated and the active allele is unmethylated. Although typical imprinting of a region results in monoallelic expression of the paternal or maternal allele, studies have shown that loci can deviate from this canonical pattern and show differential expression in a parent-of-origin-dependent manner (5,6).
Multiple studies have shown that imprinted genes affect prenatal growth control, normal brain development and postnatal metabolism (7)(8)(9)(10). The monoallelic expression of imprinted loci produces genetic vulnerabilities that can lead to monogenic syndromes. In humans, abnormal imprinting patterns at specific loci can result in genetic disorders such as Beckwith-Wiedemann and Silver-Russell syndromes that primarily affect growth, and Angelman and Prader Willi syndromes which have marked effects on growth and behaviour (11). Evidence is also growing that imprinted genes may play a significant role in complex human traits. Early linkage studies found evidence that genomic imprinting was important in the genetic aetiology of mental disorders such as Alzheimer's and schizophrenia as well as Type 2 diabetes (T2D) and body mass index (12)(13)(14). More recently, large-scale genome-wide association studies (GWAS) have found SNPs within imprinted genes that exhibit parent of origin effects and are associated with traits including age at menarche, breast cancer, basal cell carcinoma or T2D (15)(16)(17)(18).
Given that genomic imprinting appears to play a role in the genetic aetiology of multiple complex phenotypes, identifying novel imprinted genes is of considerable interest. However, the extent to which genes exhibit imprinted expression throughout the human genome is unknown. The number of validated imprinted genes in humans lies somewhere between 40 and 100 according to reviews (19)(20)(21), while some databases such as geneimprint (http://www.geneimprint.com/; date last accessed January 10, 2018) and the Otago imprinting database (22) list many more that have yet to be validated. Several methods have been used to identify imprinted loci, including analysis of differential expression between parthenogenotes and androgenotes in mice (23), bioinformatic approaches that look for novel imprinted loci based on genomic features found in known imprinted regions (24), and creating gene knockouts of paternal/ maternal alleles in mice (25). More recently, whole genome scans of imprinted regions have been performed using nextgeneration sequencing technologies to measure differential gene expression between maternally and paternally derived genes using RNA-seq (26)(27)(28) or to measure differential methylation with MethylC-Seq (29). Although some of these more recent approaches have been applied to human genomes, the number of studies has been limited and constrained to small sample sizes (27,30,31), thus limiting the ability to reliably detect imprinted genes.
Imprinted regions in the human genome can also be detected using statistical approaches that model parent-oforigin effects (POEs) of genetic variants on DNA methylation and gene expression. In the presence of imprinting, SNPs affecting DNA methylation (mQTLs) or gene expression (eQTLs) have a different effect depending on their parental origin. In this work, we leverage genome-wide DNA methylation and genotypic data of up to 740 mother-child duos from the Avon Longitudinal Study of Parents and Children (ALSPAC) to identify candidate imprinted loci.
The POEs identified were remarkably consistent across the different time points (i.e. birth, childhood and adolescence), with 63 loci identified as statistically significant at at least two time points (i.e. P < 3.7EÀ10). All the remaining loci with the exception of the FAM30A locus showed at least a nominally significant parent of origin P-value (<0.05) between the SNP and methylation at the relevant CpG site at all three time points (Tables 1 and 2).
The strongest POEs were observed within loci previously implicated in imprinting. For instance, we observed partial correlations (R) as high as 0.90 between parent-of-origin coded SNPs and CpG sites near the NAP1L5 and GNAS genes. For the novel candidate imprinted loci we observed partial correlations as high as R ¼ 0.73 for a CpG near MAP2. In Supplementary Material, Tables S4-6, we have included the summary statistics of each CpG site with at least one significant SNP at each of the different time points along with additive and dominance effect statistics.
Using data from the Brisbane Systems Genetics Study (BSGS) (41,42) we tested whether each of the CpG-SNP pairs displayed in Table 1 also exhibited POEs in that cohort. We observed that 76 out of the 82 loci presented nominally significant POEs (P-value <0.05) in this dataset. Amongst these, 30 out of the 34 novel loci replicated in this independent cohort. The Pearson correlation between effect sizes of POEs of these 82 loci in BSGS and effect sizes from adolescents in ALSPAC was R ¼ 0.8 (P-value ¼2.05EÀ42) (Supplementary Material, Fig. S1).
For some of the CpG sites, we observed patterns of methylation where the effect depended on the combination of the alleles (Fig. 2). For example, the distribution of DNA methylation at the CpG probe cg24617313 near the known imprinted genes GNAS and GNAS-AS1 resembled a bipolar dominance pattern (6) where the phenotypic value of the two homozygotes did not differ, and one of the heterozygotes had a larger phenotypic value than the two homozygotes and the other heterozygote had a smaller value ( Fig. 2A). This type of pattern was also observed for some of the CpG sites near the NAP1L5, HYMAI, IGF2R, H19, IGF2, KCNQ1OT1 and IGF1R genes (Supplementary Material, Fig. S2). It is important to note, however, that these loci not only contained CpG sites showing bipolar dominance patterns, but also contained other CpGs exhibiting the canonical pattern (i.e. uniparental effects) of imprinting (Table 3; Supplementary Material, Figs S3-S8). For instance, at the locus containing NAP1L5, 7 CpG sites displayed statistically significant POEs, but only three of them resembled a bipolar dominance pattern. Most of the loci identified displayed a DNA methylation distribution consistent with uniparental effects, where one of the alleles led to a larger average phenotypic value than the other and one of the chromosomes was putatively silenced. Figure 2B shows an example of this methylation pattern, where the mean DNA methylation of the CpG probe cg09336323 near MAP2 increases only if the minor allele 'T' is inherited from the father.

Overlap with known imprinted loci for complex traits and diseases
Previous GWAS of complex traits and diseases have reported SNPs that show parent of origin specific associations. Kong et al. (16) found that rs231362 showed a parent of origin specific association with T2D. In our study, this SNP displayed a similar POE (P-value ¼3.09EÀ12) on the CpG probe cg09518720 close to KCNQ1OT1. Kong et al. also found that the SNP rs2334499 showed a parent of origin-specific association with T2D and that the association exhibited a bipolar dominance pattern. This SNP lies 300 kb away from the H19 locus where we also observed SNPs that show parent of origin specific associations and bipolar dominance patterns. A recent large-scale GWAS of age at menarche found that the SNP rs7141210 in the DLK1 gene exhibited POEs. This SNP shows similar patterns in our data at the CpG site cg18279536 close to the DLK1 gene (P-value ¼5.01EÀ35) (18). A recent genetic study of height found that SNPs within the IGF2-H19 and DLK1-MEG3 regions displayed POEs (43). However, most of the SNPs reported in that study were rare, and were thus not analysed in our study with the exception of rs7482510 where we observed a POE (P-value ¼2.81EÀ11) on the CpG site cg25742037 near the gene IGF2.

Using methylation to determine allelic transmissions
Given that many of the loci showing parent of origin effects were associated very strongly with patterns of methylation, we were interested in the extent to which patterns of methylation might be used to determine parental transmissions in heterozygous individuals. We examined the performance of a simple statistical approach to determining transmissions at loci showing evidence of imprinting through first modelling the methylation levels of homozygous individuals, and then using this information to estimate the transmission status of each heterozygous individual (see 'Materials and Methods'). Supplementary Material, Table S7 displays the accuracy by which the heterozygous genotypes groups could be inferred using methylation levels at the single most strongly associated CpG site at each locus. The median accuracy for discriminating between heterozygote groups for the 85 loci identified in this study was area under the receiver operator characteristic curve (AUC) ¼0.73 (interquartile range: 0.68-0.79) (Supplementary Material, Table S7).  Table 1). Genes within regions previously implicated in imprinting are shown in black while those ones at least 2Mbp away from these regions are shown in white circles. Although for the majority of loci, the parental origin of alleles is difficult to determine with appreciable accuracy using DNA methylation alone, it may be the case that given very large numbers of individuals, it may still be possible to detect POEs in a large GWAS study of unrelated individuals when epigenomewide association studies (EWAS) data are also present. In Supplementary Material, Table S7, we show the sample size that would be required to achieve 80% power to detect POEs at candidate loci (a ¼ 0.0005). The sample size required increased with lower AUC and lower MAF. For example, on average, an SNP inferred with an AUC $0.75 and an MAF $0.25 required a sample size 12Â larger than if the SNP was inferred with perfect discrimination (AUC ¼1). For more common SNPs (MAF >0.4) and AUC $0.75 the required sample would be 5Â larger.

Summary of candidate imprinted loci
In this work, we presented a genome-wide scan of SNPs' POEs on DNA methylation from peripheral blood at multiple time points. We found that most of the POEs of SNPs on DNA methylation are constant throughout birth, childhood and adolescence. This observation is consistent with previous studies, which showed that although patterns of DNA methylation at many CpG sites in peripheral blood cells are not stable over time, the additive genetic effects of SNPs on methylation appear to be remarkably consistent longitudinally (44). We also showed that investigating POEs on DNA methylation is a powerful method of identifying candidate regions of the genome that may be affected by genomic imprinting. This assertion is supported by the fact that most statistically significant associations in our study corresponded to known imprinted loci and that the associations were with genetic variants in cis-i.e. it is unlikely that cis effects at genes are a product of maternal or paternal effects on children's DNA methylation, as we would expect that maternal/paternal effects were distributed evenly over the genome and hence much more likely to be trans effects rather than cis effects. Interestingly we note that SNPs at the AHRR locus showed evidence for POEs, and these effects were strongest in cord blood (then at Age 7 years, then at Age 15 years). Methylation of CpG sites at this locus is known to be affected by smoking (45), and maternal smoking can induce changes in methylation at the same locus in offspring cord blood (46). However, it is unclear how maternal smoking could correlate with transmission of SNPs at the AHRR locus and thus produce evidence for parent of origin effects on methylation at this same locus. We also note that other mechanisms that could lead to the appearance of POEs in the absence of imprinting, and that we are unable to verify are trinucleotide expansions that are sensitive to the sex of the parent that transmits them (47,48).
Most of the loci identified in the ALSPAC dataset replicated in the BSGS. Specifically, 30 out of the 34 novel loci and 76 out of In addition to suggesting the existence of multiple imprinted loci that have yet to be characterized, we also found multiple examples of POEs on methylation that resemble unusual imprinting patterns (Table 3). In particular, we observed bipolar dominance patterns among some CpG sites near the insulinlike growth factors and receptors IGF1R, IGF2R and IGF2, all of which are known imprinted loci that are located on different chromosomes. Bipolar dominance patterns have been observed previously (6,15) and are hypothesized to occur when differentially imprinted genes are in tight linkage disequilibrium (LD) but exert opposing effects on the phenotype (Fig. 3). There were also other genes nearby CpG sites that resembled bipolar dominance POEs patterns including GNAS, which has been previously described to encode maternal, paternal and biallelic derived proteins (49). For each CpG site meeting experiment-wide significance, we show the SNP that produced the strongest P-value for the POE term. If more than one CpG site was located near the same gene, the one with the smallest P-value is shown. A locus is defined to be 2 Mb apart from one another. Minor alleles (MAF <50%) were used as effect alleles (EA) while the major alleles were set to non-effect alleles (NEA). Effects are summarized as partial correlations (R) between the POE coding and methylation b value at the CpG site. Parent-of-origin genotype coding was defined as À1 for heterozygotes where the minor allele was inherited from the father, 0 for homozygotes and 1 for heterozygotes where the minor allele was inherited from the mother. The gene reported is the one that is closest to the CpG site's position. P-values for the POE between the CpG and the SNP are shown for each time point. In POE pattern 'U' refers to a uniparental effect and 'B' refers to a bipolar pattern. A definition of the POE patterns is illustrated in Figure 2.
*Results where the test of association did not reach nominal significance (P-value >0.05) were not stored.
CpG BP, CpG base pair position; Birth P, Child. P and Adol. P: P-value of SNP parent-of-origin effect on the CpG using DNA methylation measured at Birth, Childhood and Adolescence, respectively.
In our analyses, we identified 48 loci within the 178 loci previously implicated in imprinting (summarized in Supplementary Material, Table S2) and 34 outside these regions, deemed novel. The fact that we did not detect all known imprinted loci could be for various reasons, including lack of statistical power, poor coverage of CpG sites in the HM450 array, or the fact that imprinted expression is not maintained in all cell types (30), and therefore we could not detect it in peripheral blood.
The strongest POE that we identified outside known imprinted regions was on a CpG site close to the Microtubule-Associated Protein 2 (MAP2) gene which plays an essential role in neurogenesis (32). Genes located near CpGs where we also detected strong POEs included DEAD/H-Box Helicase 11 (DDX11) that is involved in rRNA transcription and plays a role in embryonic development (32,50). Other interesting genes close to CpGs exhibiting POEs included MOBP, also involved in myelination, CR1 which mediates cellular binding to particles and immune complexes that have activated complement, and PCSK9, an important gene in the metabolism of plasma cholesterol (51).

Inferring allelic transmissions in unrelated individuals
We were able to infer allelic transmissions at heterozygous individuals with moderate confidence (AUC !0.8) at 31 loci. For the remaining loci, however, our predictive ability appeared to be very limited. Because of the presence of winner's curse, these  figures are likely to represent an upper limit to the predictive ability of simple approaches to resolve allelic transmission. Nevertheless, our simulations indicate that in principle POEs could be detected with this information even if allelic transmission cannot be determined with certainty given very large numbers of individuals with both EWAS and GWAS. Whilst there are no cohorts of this size that have this kind of information currently, it is possible that in the future, as the cost of microarrays decrease, these sorts of studies might be feasible, particularly in large-scale population-based cohorts like the UK Biobank where GWAS is already available (52). Alternatively, it may be possible to achieve enough power by combining cohorts with both GWAS and EWAS in a meta-analysis, as is currently being done as part of the Genetics of DNA Methylation Consortium (GoDMC). We note also that whilst we have performed power calculations using information of a single CpG per SNP, it is likely that power to detect POEs could be increased further by incorporating information from adjacent correlated CpG sites and SNPs in imperfect LD.

Strengths and limitations
To our knowledge, this is the first study to examine the evidence for POEs on human whole-genome DNA methylation. With recent technological advances and decreasing sequencing costs, the current gold standard approach to identify imprinted genes is through RNA-seq-where it is possible to quantify the expression of heterozygote alleles (28,31,53). However, this approach is still not cost-effective as it is gene expression-and SNP-dependent; thus, imprinted genes with tissue-specific expression or lacking a heterozygous exonic SNP would be missed in the very small sample sizes that are common in such studies.
In addition, such studies usually require the genotyping or sequencing of parent-child trios in order to map the transmission of the alleles. In contrast, our approach uses large-scale array data on SNPs and methylation to infer the transmission of the alleles even in absence of one parental genome. This in turn allowed us to use a large sample size that provided us with greater statistical power to detect known and candidate imprinted regions, most of which were successfully replicated in an independent dataset. Our finding of significant POEs is less likely to be explained by experimental artefacts. In contrast to traditional GWAS that test SNPs' additive effects in, e.g. a complex disease, where batch effects during genotyping may correlate with disease status, these should not correlate with (i) parental origin of the alleles and (ii) a quantitative trait such as DNA methylation. Similarly, batch effects during DNA methylation measurement and SNPs in the probe sequences that affect hybridization to the methylation array are not expected to correlate with parental transmission of genotypes. For example, in EWAS caution is recommended for cross-reactive probes (54) as these may lead to confounded findings (e.g. the association between methylation at a CpG site and a trait is the result of an association with another CpG site with a similar probe sequence). In the case of our study, measurement errors arising from these probes would distribute evenly between the heterozygote groups, as the microarray platform cannot distinguish between maternal and paternal transmissions. In addition, we are testing parent-oforigin effects of SNPs in cis to the probe and so we believe it is unlikely that the effect we see may be between the SNP and a faraway probe with a similar sequence. Nevertheless, we have removed probes that may map to other positions in the genome (54) and caution that our results, especially those at novel loci, require replication using another technology such as pyrosequencing before artefacts of the technology can be ruled out as an explanation for significant POEs.
Our approach, however, does have its weaknesses. First, we were unable to assess directly whether the identified POEs affect the expression of the genes mentioned in this study. This is particularly problematic for the novel candidate imprinted loci where there is no prior functional work to back up our assertion. The 33 novel loci found in our study were not identified in a previous large systematic analysis of imprinting across cell lines using RNA-seq (30). Nevertheless, in the latter study only 42 out of over 100 known imprinted genes were identified. There are multiple reasons to explain the lack of support of these novel loci including sub-optimal coverage and lack of power in other studies as well as the possibility that although we observe parent-of-origin DNA methylation differences, these may not translate into differences in gene expression.
The other important limitation is that we were not able to distinguish whether the allele inherited from the father or the mother is active or inactive (i.e. whether the maternal or paternal gene is silenced) as the POEs are relative, and DNA methylation seldom has a baseline of zero. For instance, taking Figure 2B as an example, we cannot distinguish between whether the DNA methylation baseline is $0.65 and the maternally inherited minor allele increases DNA methylation while the paternally derived allele remains inactive or vice versa.  (A1 and A2). In the case of the A1 haplotype, the allele encoded by the black SNP has a positive effect on the phenotype while the allele encoded by the grey SNP has a negative effect. In the case of the A2 haplotype, the allele encoded by the black SNP has a negative effect on the phenotype while the allele encoded by the grey SNP has a positive effect. In this example, genomic imprinting results in the black SNP being inactive in the chromosome inherited by the mother and the grey SNP being inactive in the chromosome inherited by the father. In panels (A) and (B), individuals who receive two copies of either haplotype A1 or haplotype A2 have a mean phenotype of 0. In panel (C) the effect on phenotype is negative as haplotype A1 is inherited from the mother and haplotype A2 from the father. In panel (D) the overall effect is positive as haplotype A2 is inherited from the mother and the haplotype A1 from the father.

Conclusion
In conclusion, we report 34 novel genomic loci that exhibit parent of origin effects and consequently may be imprinted. We also show that the pattern of association at these loci remains stable from birth to adolescence. Although our approach does not replace traditional methods to detect genes subjected to imprinting, it is a convenient and cost-effective way to narrow down the search space and prioritize candidates. Consistent with what it is known about the biological role of imprinting, many of the identified loci were within or nearby genes with known effects on traits related to growth, development and behaviour. Our results require replication using another technology (e.g. pyrosequencing) and further functional experiments to confirm that such effects arise through a genomic imprinting mechanism.

Data
Study sample ALSPAC is a geographically based UK cohort that recruited pregnant women residing in Avon (South West England) with an expected date of delivery between 1 April 1991 and 31 December 1992. A total of 15 247 pregnancies were enrolled, with 14 775 children born (55,56). Of these births, 14 701 children were alive at 12 months. Ethical approval was obtained from the ALSPAC Law and Ethics committee and the local research ethics committees. Appropriate consent was obtained from the participants for genetic analysis. Please note that the study website contains details of all the data that are available through a fully searchable data dictionary (http://www.bris.ac.uk/alspac/ researchers/data-access/data-dictionary/).
The data used in this study correspond to the mother-child pairs from the ALSPAC cohort who took part in the Accessible Resource for Integrative Epigenomic Studies (ARIES, http:// www.ariesepigenomics.org.uk/) (44,57). We used genotypic data from 740 mother-child duos, and DNA methylation data from the 740 children. Each child had DNA methylation measured at three time points-i.e. cord blood, peripheral blood (whole blood, buffy coats, white blood cells or blood spots) during childhood ($7 years) and during adolescence (15 and 17 years).

DNA methylation
Description of the DNA methylation assays can be found elsewhere (44,57). In brief, genome-wide methylation was measured using the Illumina Infinium HumanMethylation450 (HM450) arrays. These arrays were scanned using Illumina iScan, and the initial quality review was done in GenomeStudio. A wide range of batch variables were measured for each sample during the data generation, including quality control (QC) metrics from the standard control probes on the array. Samples failing QC were not included in the analysis. Data points with a low signal: noise ratio (detection P > 0.01) or with methylated or unmethylated read counts of zero were also excluded from analysis. Genotype probes in the HM450 array of the same individual at different time points were used to identify and remove sample mismatches. DNA methylation at each CpG probe was normalised using the Touleimat and Tost algorithm implemented in the R package watermelon (58) to reduce the non-biological differences between probes. We removed 30 970 CpG sites with probe sequences that substantially overlapped with other locations of the genome (54). Finally, b values (i.e. the proportion of methylation) of 437 542 CpG sites were included in the analysis.

Genotypes
Mother-child duos participating in ARIES were previously genotyped as part of a former ALSPAC study, the details of which can be found elsewhere (55,56,59). Briefly, children were genotyped on Illumina HumanHap550 quad-chip platforms by the Wellcome Trust Sanger Institute (Cambridge, UK) and by the Laboratory Corporation of America (Burlington, USA) using support from 23andMe. Mothers were genotyped on Illumina HumanHap660W quad-chip platform by Centre National de Gé notypage (É vry, FR). Standard QC was applied to SNPs and individuals. Individuals were excluded based on genotype rate (<5%), sex mismatch, high heterozygosity and cryptic relatedness [defined as identity-by-descent (IBD) >0.125]. In order to remove individuals of non-European descent, principal components (PCs) were derived from LD-pruned SNPs with MAF >0.01 using plink (60). Individuals laying 5 SD beyond the 1000 Genomes European population PCs 1 and 2 centroid were excluded. SNPs with a minor allele frequency (MAF) <1%, genotyping rate <5% or with a deviation from Hardy-Weinberg disequilibrium (pP < < 1 Â 10 À6 ) were removed from the analysis.
Genotype Imputation was performed by first phasing the genotypes using SHAPEIT V2 (61), and then imputing to the HapMap CEU reference panel using Impute (v2.2.2) (62). Genotypes were removed if they deviated from Hardy-Weinberg equilibrium P < 5 Â 10 À6 , MAF <5% (the high threshold was to minimize the possibility of low frequency variants producing chance parent of origin effects through statistical fluctuation) or imputation info score <0.8. Best guess genotypes were used for subsequent analyses. The final imputed dataset used for the analyses presented here contained 2 158 724 SNPs.

Statistical analysis
Identifying transmission of the alleles The crucial first step in identifying POEs is assigning alleles to their parental origin. In order to achieve this, we applied the duoHMM algorithm implemented in the software SHAPEIT V2 (63) to the most likely imputed genotypes from the ALSPAC mothers and children. This algorithm leverages LD and IBD sharing in order to phase genotypes and resolve the parental origin of alleles at each SNP. Using a custom written Perl script, the phased genotypes were formatted in a way such that heterozygotes where the minor allele was inherited from the mother were coded as 1, homozygotes were coded as 0 and heterozygotes where the minor allele was transmitted by the father were coded as À1. In order to confirm the accuracy of our approach to resolve the transmission of the alleles, we compared the haplotypes of the mothers and children. We observed that for each of the children, the alleles of the haplotype inferred to be the one inherited from the mother, matched to those from the mother 99.9% of the time. We attribute the 0.1% of mismatches to genotyping or imputation errors in mothers or children. This calculation assumes that phasing is 100% accurate whereas in reality there will be some errors in the haplotyping process. We note that the accuracy of phasing is extremely high when trio data is available (i.e. >99.8%; 64) and high when using thousands of unrelated individuals with dense genotyping (>98%; 65). We expect that the accuracy of phasing using mother-offspring duos is intermediate between the two and thus enabling highly accurate determination of parent of origin information. It is also important to realize that any errors in phasing will decrease power to detect POEs, but would not lead to increased Type 1 error rates.

Regression model
In order to identify SNPs in the genome displaying POEs on DNA methylation from the three time points (birth, childhood and adolescence), we employed a regression model (6,66) to estimate: the additive effect b A , defined as the equal contribution of each minor allele to the phenotype; (ii) the dominance effect b D that measures the deviation of the heterozygote from the mean phenotypic value of the two homozygotes and the parent-oforigin effect b p , which is the mean difference between heterozygotes (i.e. the heterozygote where allele 'A' is paternally transmitted, and the heterozygote where allele 'A' is maternally transmitted). In matrix annotation, with intercept term b 0 , the mean phenotypic value for each possible genotype can be modelled as: With the genotypes (AA, Aa, aA, aa) ordered (e.g. paternal first then maternal). This coding of genotypes enables testing for effects that are strictly owing to parent-of-origin effects, as under Hardy-Weinberg equilibrium the parent-of-origin vectors are orthogonal to the additive and dominant effects.
Given that DNA methylation is affected by sex and age, these factors were incorporated into the model as covariates, along with the first three ancestry informative PCs derived from genome-wide SNP genotypes in order to control for the population stratification, as well as the first 10 PCs derived from the control matrix of the Illumina HumanMethylation450 assays to control for batch effects. The following model: was fitted to the 468 512 DNA methylation CpG probes against SNPs within 500 kb from the CpG probe (i.e. SNPs in cis). SNPs beyond 500 kb from the CpG site were not assessed as it would have increased the multiple testing burden by three orders of magnitude and the number of individuals in this study may not yield enough power to detect reliable associations of SNPs in trans (44). In this model, CpG is the column vector of DNA methylation values of a CpG probe; b 0 is the intercept; b A the regression coefficient of the SNP additive effect; A is the vector of genotypes in additive coding; b D the regression coefficient of the SNP dominance effect; D is the vector of genotypes in dominance coding; b P is the regression coefficient of the SNP parentof-origin effect; P is the vector of genotypes in parent-of-origin coding; b i the regression coefficient of the covariates; and cov i are the covariates specified above. Given that DNA methylation values suffer from heteroscedasticity, White-Huber standard errors (67) were computed to estimate the significance of the POE term b P using the sandwich package in R. Partial correlations displayed in Table 1 correspond to the Pearson correlation between residuals of the CpG's DNA methylation after adjusting by the covariates described above and the parent-of-origin coded SNP.

Significance threshold
In total, $400 M statistical tests were performed. Given that neighbouring SNPs usually display a high degree of correlation between each other owing to LD, the number of independent tests was empirically estimated using a matrix spectral decomposition algorithm of the correlation matrix (68,69). We applied this algorithm in 100 randomly selected autosomal genomic regions of 1 Mb each and observed that the number of independent SNPs was 0.33 times (95% CI 0.28, 0.38) the number of SNPs tested. Hence the effective number of tests was $132 M and the Bonferroni significance threshold was set at P-value <3.7EÀ10. We note, however, that this threshold may still be conservative as the correlation between CpG probes has not been taken into account.

Replication
We used data from the Brisbane Systems Genetic Study (BSGS) (41,42) as replication sample. We employed a subset of 462 individuals from 176 families with genotypic and DNA methylation data where we were able to infer the parental transmission of the alleles. Detailed information about the BSGS can be found elsewhere (41,42). In brief, the participants were genotyped using the Illumina 610-Quad Beadchip and imputed against 1000 Genomes European ancestry population. Whole blood DNA methylation levels were measured with the Illumina HumanMethylation450 array and normalized as describe in McRae et al. (41).
Parental transmission of the alleles was inferred using the duoHMM algorithm implemented in SHAPEIT v2. A linear mixed model was fitted between each CpG-SNP pair exhibiting a statistically significant POE in the ALSPAC data shown in Table 1. For CpG-SNP pairs where the SNP was not available, a proxy SNP (R 2 > 0.8) or a nearby CpG was used instead. An additive genetic relationship matrix derived from common SNPs (MAF >0.05) was employed as random effect in the linear mixed model to control for the relatedness of the individuals. Sex, age, top five PCs derived from the DNA methylation data and top two PCs derived from the genotype data were used as fixed effects.
The mean age of the BSGS cohort was 13.8 (SD ¼2.06) and thus we compared effect sizes from POEs with the ones estimated using DNA methylation data of adolescents from ALSPAC using a Pearson correlation (Supplementary Material, Fig. S1).

Predicting parental transmission in heterozygote individuals using methylation status
During this project, we observed that DNA methylation at some CpG sites could potentially be used to infer the parental transmission in heterozygote individuals of samples without parental genotypes. Under a uniparental expression pattern of imprinting, one of the parental alleles remains inactive leading to the phenotypic mean of one of the heterozygote groups (e.g. minor allele inherited by the mother) being equal to the mean of the minor allele homozygote, while the phenotypic mean of the other heterozygote group (e.g. minor allele inherited by the father) is equal to the mean of the major allele homozygote. With this premise, we fitted a logistic model to the homozygous individuals for each of the statistically significant SNPs found in this study: where H is a vector with labels 0 for minor allele homozygotes and 1 for major allele homozygotes and CpG is the DNA methylation at the relevant CpG site.
We then used this fitted logistic model to predict the pattern of allelic transmission for each heterozygote individual at the putatively imprinted SNPs. Note that this approach can also predict the allelic transmission at other patterns of imprinting (e.g. bipolar or polar dominance) as it splits heterozygote individuals into those that are above the phenotypic mean of the (e.g. minor allele) homozygous individuals and those that are below the phenotypic mean of the (e.g. major allele) homozygous individuals. To measure how well this method performed, we computed the Area Under the receiver operating Characteristic curve (AUC) for each SNP.
We estimated the sample size that would be required to achieve 80% statistical power to detect POEs using this approach to infer parental transmission compared with having actual parental genotypes and being able to identify each heterozygote group correctly (as was the case in this study). We simulated 500 runs for each SNP where POEs explained: 0.5, 1, 2, 4 and 9% of the variance (R 2 ) using known parent-of-origin coded genotypes (i.e. 0 for homozygotes, and À1 or 1 for each of the heterozygote groups, AUC ¼ 1). We then estimated how the variance explained degraded when using the inferred genotypes coded as 0 for homozygotes and as an expected dosage for heterozygotes: P À (1 ÀP), where P is the probability of being in heterozygous group 1 and 1 ÀP the probability of being in heterozygous group 2. For example, when we simulated a POE using the known parent-of-origin coded genotypes (i.e. AUC ¼ 1) that explained R 2 ¼ 1%, the variance explained would drop to R 2 ¼ 0.09% when using the inferred (AUC ¼ 0.75) parent-of-origin coded genotypes (as expected, R 2 would normally degrade relative to AUC and MAF). We then used the function pwr.r.test from the 'pwr' package in R that implements a Z' transformation of the correlation (70) to derive the sample size required to achieve 80% power with a ¼ 0.0005.

Supplementary Material
Supplementary Material is available at HMG online.