Transcription and chromatin determinants of de novo DNA methylation timing in oocytes

Gametogenesis in mammals entails profound re-patterning of the epigenome. In the female germline, DNA methylation is acquired late in oogenesis from an essentially unmethylated baseline and is established largely as a consequence of transcription events. Molecular and functional studies have shown that imprinted genes become methylated at different times during oocyte growth; however, little is known about the kinetics of methylation gain genome wide and the reasons for asynchrony in methylation at imprinted loci. Given the predominant role of transcription, we sought to investigate whether transcription timing is rate limiting for de novo methylation and determines the asynchrony of methylation events. Therefore, we generated genome-wide methylation and transcriptome maps of size-selected, growing oocytes to capture the onset and progression of methylation. We find that most sequence elements, including most classes of transposable elements, acquire methylation at similar rates overall. However, methylation of CpG islands (CGIs) is delayed compared with the genome average and there are reproducible differences amongst CGIs in onset of methylation. Although more highly transcribed genes acquire methylation earlier, the major transitions in the oocyte transcriptome occur well before the de novo methylation phase, indicating that transcription is generally not rate limiting in conferring permissiveness to DNA methylation. Instead, CGI methylation timing negatively correlates with enrichment for histone 3 lysine 4 (H3K4) methylation and dependence on the H3K4 demethylases KDM1A and KDM1B, implicating chromatin remodelling as a major determinant of methylation timing. We also identified differential enrichment of transcription factor binding motifs in CGIs acquiring methylation early or late in oocyte growth. By combining these parameters into multiple regression models, we were able to account for about a fifth of the variation in methylation timing of CGIs. Finally, we show that establishment of non-CpG methylation, which is prevalent in fully grown oocytes, and methylation over non-transcribed regions, are later events in oogenesis. These results do not support a major role for transcriptional transitions in the time of onset of DNA methylation in the oocyte, but suggest a model in which sequences least dependent on chromatin remodelling are the earliest to become permissive for methylation.


Background
The establishment of DNA methylation in the female germline in mammals is essential for genomic imprinting and successful development of the embryo following fertilisation [1][2][3]. Following genome-wide erasure of methylation in primordial germ cells [4], mammalian oocytes acquire a highly structured DNA methylation landscape in which domains of uniform methylation are separated by extensive unmethylated domains [5,6]; this largely bimodal pattern is unique amongst mammalian cell types. DNA methylation is associated mostly with transcriptionally active gene bodies in oocytes, and these methylated domains contain intragenically located CpG islands (CGIs) that also gain methylation, including the germline differentially methylated regions (gDMRs) of imprinted genes [5][6][7]. As a result, there is highly programmed methylation of a defined set of ~2000 CGIs in oocytes, mostly on account of their location within active transcription units. We, and others, have shown that transcription is functionally required to define methylation in oocytes: Abrogating specific transcription events prevents methylation of the associated loci, including at imprinted gDMRs [6,8,9].
The oocyte represents a pure de novo methylation system, as an entire DNA methylation landscape is established on an essentially unmethylated genome in a non-dividing cell [10]; therefore, it provides a unique opportunity to investigate the extent to which different sequence features acquire methylation as a result of common or distinct mechanisms. Current knowledge is largely limited to the fully established DNA methylome in fully grown oocytes at the germinal vesicle (GV) stage or in ovulated metaphase II (MII) oocytes [5,11], such that differences in the mechanistic requirements for methylation of various sequence elements or in the kinetics of their methylation are obscured. Thus, investigating methylation at intermediate stages would be informative, but genome-wide studies have not yet been done. Analysis of a limited number of imprinted gDMRs identified that de novo methylation is a function of developmental stage of follicles and oocyte size, with methylation initiated around the time follicles transition into the antral or secondary follicle stage of development. Moreover, locusspecific analysis has shown that the onset and progression of methylation appear to differ between imprinted gDMRs [12][13][14]. This asynchrony has functional importance, as nuclear transfer experiments have shown that different imprinted domains acquire imprinting competence at different stages of oocyte growth [15].
In view of the rather simple methylation landscape of the oocyte, the differential timing of methylation acquisition at gDMRs is unexpected, and the reasons for this asynchrony are unclear. Understanding its basis is essential for identifying the origin of methylation defects in oocytes that could underlie some errors in imprinting. Such asynchrony also suggests that different factors, or combinations of factors, may be necessary for methylation of different gDMRs, individual CGIs or individual methylated domains, aside from the common requirement for the de novo DNA methyltransferase DNMT3A and its obligate partner DNMT3L [5,7,11]. Given the strong association with transcription [6], and major changes in the transcription programme during oocyte growth [16], one possibility is that the timings of transcription events traversing gDMRs and CGIs could account for differences in the onset of methylation at individual elements.
At a mechanistic level, de novo DNA methylation occurs in a chromatin template and, in accordance with the biochemical properties of DNMT3A and DNMT3L [17][18][19], is predicted to depend upon the acquisition of a permissive histone modification state. Thus, regions destined for DNA methylation are proposed to be marked by histone 3 trimethylated at lysine 36 (H3K36me3) and should lack H3 di-or trimethylated at lysine 4 (H3K4me2/ me3) [7,20]. Evidence in support of this model is the requirement for the H3K4 demethylase KDM1B for DNA methylation of most imprinted gDMRs and CGIs that acquire methylation in oocytes and the increase in H3K36me3 at these elements during oocyte growth [20,21]. Such chromatin state changes may also be downstream of transcription events: H3K36me3 is deposited by SETD2 in association with elongating RNA polymerase II [22][23][24], although the role of SETD2 in oocytes has not yet been determined; and removal of H3K4me2 and gain of H3K36me3 at the gDMR of the imprinted locus Zac1 in oocytes was shown to depend on transcription from an upstream, oocyte-specific promoter [6].
To investigate how transcription influences the kinetics of methylation at gDMRs and throughout the genome, we generated genome-wide DNA methylation and high-resolution transcriptome maps of size-selected populations of growing oocytes spanning the onset of methylation. We find that the major remodelling of the oocyte transcriptome occurs well before the onset of DNA methylation, indicating that initiation of transcription events is not temporally coupled to methylation of specific loci. However, rate of gene body methylation does correlate with transcription level, which could reflect the degree of transcription-coupled chromatin remodelling. CGI methylation timing reflects (1) the H3K4me2 levels found in non-growing and early growing oocytes, (2) dependence on H3K4 demethylases and (3) presence of specific transcription factor motifs, supporting a model in which sequences requiring less chromatin remodelling are the earliest to become permissive for de novo methylation.
The overall CpG methylation level of 60-65 μm oocytes determined by PBAT was 22.25%, compared with 2.36% in NGOs and 38.68% in GV oocytes, showing that this stage represents a midpoint in the progression of global de novo methylation ( Fig. 1a; Additional file 2: Table S2). We then evaluated whether all genomic features that become methylated in GV oocytes gain methylation at similar rates, including the hypermethylated domains of GV oocytes we previously designated [6]. CpGs in hypermethylated domains have attained on average 48.00 ± 0.02% methylation in 60-65 μm oocytes (Additional file 2: Table S2), although there is a considerable spread in the methylation level of these CpGs at this time (Fig. 1b). We previously showed that 85-90% of hypermethylated domains were associated with transcription units active in oocytes [6]; therefore, we asked whether domains associated with transcription units and those apparently not associated with transcription displayed similar kinetics of methylation. Comparison of CpG methylation rate of transcribed hypermethylated domains and apparently transcriptionally silent hypermethylated domains revealed that CpGs in transcriptionally silent regions are methylated later: average CpG methylation in transcribed domains is 50.1% but 30.0% for transcriptionally silent regions (Fig. 1b). For CGIs that become methylated fully (≥75%) in GV oocytes, mean methylation (37.21 ± 0.69%) in 60-65 μm oocytes was less than most other sequence features ( Fig. 1b; Additional file 2: Table S2). An effect of CpG density is also apparent when considering 2-kb genomic windows: regions of highest methylation (≥80%) in 60-65 μm oocytes had on average lower CpG density and GC content (Fig. 1c).
Similar to hypermethylated domains, most classes of transposable element (TEs) that become methylated (≥75%) in GV oocytes are midway in methylation progression ( Fig. 1b; Additional file 2: Table S2, Additional file 3: Fig. S1A), although there was interesting variation in the kinetics of specific elements. Some TEs start at a higher level of methylation in NGOs, such as some endogenous retroviral (ERVK) long-terminal repeat (LTR) elements, reflecting incomplete erasure of methylation in primordial germ cells [4]. In addition, there was a significant variation in the rate of methylation of specific TE subfamilies. Notably, of the 20 most abundant LINE-L1 subfamilies, methylation of three of the four L1Md subfamilies was significantly delayed (average methylation of L1Md_A 39.9%, L1Md_F3 44.3% and L1Md_T 42.0%, compared with 48.1-54.4% for the remaining L1 subfamilies). In comparison, there were no differences in the methylation rate of the 20 MaLR subfamilies (Additional file 3: Fig. S1B). L1Md elements are amongst the youngest L1s, with the least degenerated sequence, the most intact transcription factor (TF) binding sites and which have to be actively suppressed [26,27]. Many of the L1Md subfamilies also retained residual methylation in NGOs (6.5-19.9%, compared with 1.4-3.7% for other L1s). These results indicate that different sequence features acquire methylation with similar overall kinetics, suggesting that the de novo methylation complex is not targeted preferentially to any particular sequence feature. However, the delayed methylation of CGIs and specific L1 subfamilies, as well as at untranscribed regions, points to additional or alternative mechanistic requirements at these elements.
In fully grown oocytes, there is a high level of concordance in methylation of adjacent CpGs across the extensive hypermethylated domains [6]. Having captured oocytes midway in the progression of methylation, we looked at the coherence of ongoing methylation to investigate co-operativity of the de novo methylation complex. For each sequencing read containing multiple CpGs, we asked how often and over what distance CpGs had the same methylation state. For 60-65 μm oocytes, neighbouring CpGs were both methylated 60-70% of the time over 60 bp and at least 50% of the time over 90 bp (Fig. 2a). If CpG sites were being methylated individually without co-operativity, the probability that CpG   CpG CHG C HH d pairs were both methylated would equate to the overall genomic methylation level which, in 60-65 μm oocytes, was 17.56%. Therefore, these data indicate co-operativity in methylation of adjacent CpGs by DNMT3A/DNMT3L during oocyte growth, similar to findings of DNMT3B function in embryonic stem cell (ESC) [28]. We note, also, that although concordance of methylation declines with distance, there is a local maximum in the correlation at ~180 bp (Fig. 2b), which approximates the size of a nucleosome, consistent with a model in which de novo methylation occurs in linker regions, as proposed in ESCs [28]. Finally, oocytes have been shown to have extensive methylation outside of the CpG context, in that methylation of non-CpG sites accounts for more than half of the total amount of methylated cytosine [11,29].

CGIs and imprinted gDMRs gain DNA methylation at different rates in oocytes
To look in more detail at the progression of methylation at CGIs, we considered the RRBS datasets. There is very little methylation of CGIs in 40-45 μm oocytes: only three CGIs (of 522 CGIs with sufficient data) that become fully methylated in GV oocytes were methylated ≥25% in 40-45 μm oocytes, and two of these have residual The distant-dependent correlation of methylation between CpG pairs in 60-65 µm oocytes, compared with random-shuffled data methylation in NGOs [7]. Methylation was first detected in the 50-55 μm size class (22% of CGIs destined for full methylation having ≥25% methylation in this size group) and at least 55% of CGIs showed intermediate (25-75%) to high (≥75%) levels of methylation in 60-65 μm oocytes (Fig. 3a). Overall, there was a very high level of correlation (R = 0.929) between the RRBS and PBAT libraries in CGI methylation at the 60-65 μm stage (Additional file 3: Fig. S2A), suggesting that the differences in level of methylation are reproducible and biological in origin. Focussing on imprinted gDMRs, methylation in 60-65 μm oocytes assessed by the two methods ranged from 0 to ~70% ( Fig. 3b; Additional file 3: Fig. S2A), again with a high degree of consistency in methylation of individual gDMRs determined by the two methods (noting that RRBS and PBAT will not have identical sequence coverage across each gDMR). For example, the Igf2r gDMR had attained 32.5% methylation in 50-55 μm oocytes and 67.9% in 60-65 μm oocytes, while the Cdh15 igDMR was <5% methylated even in 60-65 μm oocytes (Fig. 3b). This range of methylation is broadly consistent with earlier studies that analysed limited numbers of gDMRs by locus-specific bisulphite sequencing (again, with the caveat that different regions of the gDMRs will have been assayed by the various methods; [12][13][14]). For a subset of CGIs, we also validated the time of onset by locus-specific bisulphite sequencing (Fig. 3c). The differential onset of CGI methylation is not related to CpG content or GC richness of these CGIs (Additional file 3: Fig. S2B). In conclusion, CGIs destined for methylation in GV oocytes are not co-ordinately methylated but display substantial and reproducible differences in time of onset of methylation in growing oocytes, and this variation is not a simple property of overall sequence composition.

Mapping changes in the oocyte transcriptome during oocyte growth
We sought to test the relationship between methylation kinetics and changes in transcription during oocyte development and growth. To do so, we generated deep, strand-specific total RNA-seq libraries in duplicate from the same size populations of growing oocytes as used in methylation analysis, as well as an earlier population (10-30 μm) and a GV population (Additional file 5: Table  S4). In addition, the data were compared with RNA-seq from NGOs collected at embryonic day E18.5 [20] and an existing GV data set [6]. Although transcriptional changes have been documented during mouse oocyte development before [16], those data were generated using expression microarrays that capture only a fraction of the transcription units actually present in oocytes and cannot be used to infer alternative transcription start site (TSS) use: our previous work has demonstrated the importance of using the correct transcriptome for accurate association with methylation [6]. Although the RNAseq data sets do not capture nascent transcription events, they do enable us to determine the time during oocyte growth that transcription units are first active, including the use of alternative upstream TSSs that are prevalent in oocytes [6]. Transcript abundance was used as a proxy for transcription rate.
The RNA-seq data sets were compared with the oocyte transcriptome assembly previously generated in our laboratory [6], resulting in the detection of 21,402-32,775 genes (FPKM thresholds 0.017-0.102) in the various oocyte size populations (Additional file 6: Table S5). Principal component (PC) analysis of the global expression patterns showed that data sets from growing and GV oocytes cluster together, with the E18.5 transcriptome being the most distinct; PC2 segregates the growing oocyte populations by size, particularly when the E18.5 data set is excluded (Additional file 3: Fig. S3). It should be noted that E18.5 oocytes were collected using FACS, such that RNA was extracted from fixed samples, whereas all post-natal oocytes were collected manually, and these technical differences could contribute to some differences between the E18.5 transcriptome and the other stages. Nevertheless, most transcripts (68%) were already detected at E18.5, and a further 28% were detected first in 10-30 µm oocytes, with very few appearing for the first time in larger size populations (Fig. 4a). The general stability of gene expression in the growing oocyte populations, even as cytoplasmic volume and mRNA content are increasing substantially, is reflected in the rather small numbers of genes identified as differentially expressed (<4%) between consecutive stages (Additional file 3: Fig. S4). Based on our oocyte transcriptome assembly, we segregated genes into reference genes (i.e. previously annotated genes) and novel genes, either novel multiexonic or monoexonic. For reference genes expressed from their canonical TSSs, 88% were already detected at E18.5; in comparison, most novel genes were detected first in 10-30 µm oocytes (~63% multi-and ~57% monoexonic novel genes), with a small minority first detected in larger oocytes (~8 and ~13% for multiand monoexonic genes, respectively; Fig. 4a). Similarly, most (~70%) novel upstream TSSs were activated in 10-30 µm oocytes. Therefore, most changes in the oocyte transcriptome occur well in advance of the onset of de novo methylation, which initiates after the 40-45 µm stage. This effect can be seen at individual imprinted loci: all gDMRs are found within transcription units even at the earlier stages, irrespective of whether they are transcribed from alternative promoters or whether methylation is detected early (50-55 µm) or late in oocyte growth (60-65 µm) (Fig. 4b).  Despite the general stability of gene expression during oocyte growth stages (Additional file 3: Fig. S4), the RNA-seq data sets provide unprecedented detail into the changes in transcript abundance during critical times in oocyte growth and follicular differentiation. We identified 530 genes, mostly protein-coding, up-regulated greater than 50-fold between E18.5 and GV oocytes, and 283 upregulated >50-fold between E18.5 and 10-30 μm oocytes (Additional file 7: Table S6). Gene ontology (GO) analysis did not reveal particularly strong enrichment terms ("Regulation of reproductive process" containing 10 of the 283 genes had the highest enrichment of 5.53, p value 1.27 × 10 −5 , adjusted FDR 0.164), perhaps reflecting the wide diversity of functions required during oogenesis as well as the accumulation of maternal RNA stores for processes in the zygote (Additional file 3: Fig. S5). The set of highly induced transcripts did contain genes for oocytespecific transcriptional regulators such as OBOX1, 2 and 5, the maternal effect homeobox SEBOX, the zona pellucida proteins 1, 2 and 3 (ZP1, 2, 3), components of the subcortical maternal complex (OOEP, TLE6) and members of the reproduction-related NLRP family (nucleotide-binding oligomerization domain, leucine-rich repeat and pyrin domain-containing proteins), as well as oocyte genes with less well explored functions (Oas1d, Oosp1, Omt2b) (Additional file 7: Table S6). We also specifically examined the gene expression dynamics of candidate factors involved in de novo DNA methylation and associated epigenetic modifications, such as DNMT3A and DNMT3L, H3K4 demethylases of the KDM1 and KDM5 families, and the H3K36 methyltransferase SETD2. Although many of the corresponding genes appear to be stably expressed during oocyte growth, there was substantial up-regulation of Kdm1b, Dnmt1 and particularly Dnmt3L, whose transcripts appear first in 10-30 μm oocytes (Additional file 3: Fig. S6). These transcript dynamics are consistent with the reported appearance of KDM1B and DNMT3L proteins during oocyte growth [21,30].

DNA methylation kinetics in relation to transcription events
Although the global results above do not support a major role for activation of specific transcription units in the timing of de novo methylation, we performed several additional analyses to investigate in more detail possible relationships between transcription events and temporal control of methylation. We compared the methylation level of multiexonic reference genes and multiexonic novel genes, reasoning that the reference genes are generally expressed from earlier time points in oocyte growth (Fig. 4a). For this, we selected genes ≥4 kb in length (as shorter genes are unmethylated across much of their length) and set an expression threshold of ≥2 FPKM (to mitigate an effect of expression level). In this comparison, reference genes as a set have accumulated more methylation in 60-65 µm oocytes (Fig. 5a). Level of expression could still contribute to this effect, as novel genes are less highly expressed [6]: for the genes we included above 2 FPKM, median FPKM values were 11.4 and 3.9 for reference and novel genes, respectively. Indeed, there was a positive correlation between gene body methylation and expression level in 60-65 µm oocytes, particularly for reference genes, although the relationship plateaus for more highly methylated gene bodies (Fig. 5b). We also considered whether genes exceeding an expression threshold earlier during oocyte growth acquire methylation sooner, and this appeared to be the case (Fig. 5c). Again, however, it is difficult to separate out an effect of gene expression level, as genes crossing the threshold earlier are also more highly expressed in 60-65 µm oocytes (Fig. 5d). An effect on host gene expression was apparent for intragenic CGIs that gain methylation during oocyte growth, although the differences between groups were not significant (Fig. 5e). We also examined whether the extent of methylation of these CGIs in 60-65 µm oocytes reflected whether they were active TSSs at an earlier stage (E18.5 NGOs). Indeed, CGIs previously acting as TSSs had gained less methylation on average than non-TSS-CGIs (Fig. 5f ). This analysis was performed with the PBAT data set, as RRBS data have limited coverage of gene bodies. When we compared DNA methylation of intragenic CGIs in 50-55 and 60-65 µm RRBS data sets with expression levels of overlapping genes in the corresponding RNA-seq datasets, we obtained similar results to the PBAT data (Additional file 3: Fig. S7).
(See figure on previous page.) Fig. 3 CpG islands gain DNA methylation at different rates in growing oocytes. a Barchart of CGI methylation in the oocyte size populations from the RRBS and PBAT datasets. The number of CGIs covered in each dataset is given in Additional file 1: Table S1. b Methylation of gDMRs in RRBS datasets, displaying the basal level in 40-45 µm oocytes, and the increases in methylation to the subsequent size populations. gDMRs are ordered according to their methylation level in 60-65 µm oocytes, which is comparable with PBAT data (see Additional file 3: Fig. S2A). c Validation of CGI methylation in different oocyte size populations. Heatmap shows methylation progression at CGIs that become methylated between 40 and 45 µm and MII oocytes (data from published GV and MII RRBS datasets). Five early-methylating CGIs and five late-methylating CGIs were selected, and their methylation in 50-55 µm oocytes (early-methylating CGIs) or both 50-55 and 60-65 µm oocytes was confirmed by locus-specific bisulphite sequencing. Changes in TSS use could reflect changes in binding of sequence-specific TFs at these sites, possibly as a consequence of down-or up-regulation of these factors during oocyte growth. In this context, it has previously been reported that the CGCGC consensus site of E2F1 and E2F2 is enriched in intragenic CGIs that are completely resistant to de novo methylation in oocytes [31]. Accordingly, we used the motif analysis package DREME [32] to identify motifs differentially enriched in CGIs with different levels of methylation in 60-65 µm oocytes. We searched for motifs enriched in late-methylated CGIs (≤25%) compared to CGIs with 25-50, 50-75 and ≥75% methylation, as well as for motifs enriched in early-methylated CGIs (≥75 and 50-75% methylation) compared to late-methylated CGIs (Fig. 6a). There were no motifs enriched in early-methylated CGIs compared to CGIs gaining methylation later, suggesting that there is no sequence motif targeting methylation to specific CGIs. On the other hand, we found motifs significantly enriched in late-methylated CGIs. Considering the comparison between ≥75% methylated and ≤25% methylated CGIs as likely to give the greatest discriminating signal, there were 21 sequence motifs with a significant difference in enrichment, three of which correspond to binding sites of known TFs ( Fig. 6b; Additional file 8: Table S7). Of these, the most significant motif C(C/G/T)CCGCC (p value = 7.4 × 10 −13 ) was detected in 55% of the latemethylating CGIs but only 9.5% of the early-methylating CGIs. We repeated the analysis with the MEME motif analysis package [33] to search for longer motifs than DREME. Again, the significantly enriched motifs were found only in late-methylated CGIs compared to CGIs with methylation of 50-75 and ≥75%. Late-methylated CGIs appear to be enriched in G-rich motifs; however, these motifs are also present in 50% or more of the earlymethylating CGIs (Additional file 3: Fig. S8).

CGI methylation in relation to chromatin state
Since transcription does not appear to be an overriding factor in the differential timing of CGI methylation, we examined the influence of specific histone post-translational modifications, given the likely importance of chromatin state in recruitment of the DNMT3A:DNMT3L complex. We divided CGIs that become fully methylated (≥75%) in GV oocytes into levels of methylation attained in 60-65 μm oocytes and assessed the enrichment of histone modifications as determined by chromatin immunoprecipitation and sequencing (ChIP-seq) in NGOs (isolated at E18.5) and early growing oocytes (post-natal day p10) [20]. Of the modifications implicated in promoting or antagonising DNA methylation, levels of H3K36me3 showed a positive correlation with DNA methylation level; H3K4me2 and H3K4me3, conversely, were negatively correlated (Fig. 7a, all p values <1 × 10 −10 ). We then looked whether there was a relationship with dependence on the H3K4me2 demethylases KDM1A and KDM1B. We have previously shown that loss of KDM1B, in particular, affects the methylation level acquired in MII oocytes of many CGIs, but there is a considerable variation in the magnitude of the dependency [20]. Therefore, we compared the change in DNA methylation of CGIs in oocytes deficient in KDM1A or KDM1B with level of methylation in wild-type, 60-65 μm oocytes, which showed that later-methylating CGIs (i.e. less methylation in 60-65 μm oocytes) are most dependent on KDM1A or KDM1B to become fully methylated in MII oocytes (Fig. 7b). Examples of early-and late-methylating CGIs in relation to H3K4me2 level and KDM1B dependence are shown in Fig. 7c.

Modelling factors determining rate of CGI methylation
To test the extent to which the above variables, alone or in combination, account for the differential timing of CGI methylation in growing oocytes, we applied several regression models. We considered up to nine independent variables, including the three transcription factor binding motifs significantly enriched in the late-methylating CGIs (Table 1), with methylation level in 60-65 µm oocytes as response variable. As all the variables except GC content are in statistically significant linear relationship with the response variable, we first tested how much of the methylation variation could be attributed to each of the variables alone in simple linear regression models. H3K4me2 enrichment at p10 and dependence on KDM1B and KDM1A explained the highest proportion of the variability in the methylation data: 11.2, 10.5 and 9.7%, respectively. Because of the multicollinearity amongst independent variables (e.g. high correlation between transcription level and H3K36me3 enrichment, or between H3K4me2 and H3K4me3 enrichments), we could not test the combination of all variables in a classical multiple linear regression model. Instead, we applied linear modelling approaches correcting for multicollinearity-Ridge, Lasso and ElasticNet regressions-and looked for the best fit. Lasso and ElasticNet regression models using all nine variables explain 23.14% of the variability (Fig. 8). However, the cross-validation of models, where individual independent variables are added one by one to the model, in each step adding the variable that explains the highest proportion of the variability, revealed that     (Table 1). Although the remaining variables increase the explained proportion of methylation variability, they also increase the noise level and therefore do not statistically improve the model. We also tested other regression modelling approaches not requiring the linear relationships, such as polynomial regression; however, the fit of the models was not improved.

Discussion
DNA methylation in the mouse oocyte depends upon DNMT3A and DNMT3L is primarily over gene bodies and largely determined by transcription, but these global dependencies could obscure sequence-specific requirements or the involvement of additional factors at specific elements. By capturing oocytes in the mid-phase of de novo methylation, we find that all sequence features gain CpG methylation at similar rates overall, including most classes of TEs, suggesting a universal rather than a feature-specific targeting mechanism. CGIs as a whole and a subset of L1 elements gain methylation later, however. In relation to CGIs, this relative delay might reflect that they are marked by default with histone modifications antagonistic to DNA methylation, such as H3K4me2/me3, and younger L1 elements may be suppressed by histone modifications inhibitory to DNA methylation. Amongst CGIs, however, there are reproducible differences in time of onset and/or progression in de novo methylation. This finding, at the genome-wide scale, substantially extends earlier studies on limited numbers of imprinted gDMRs [12][13][14] and suggests that CGIs destined for methylation initially exist in different states of permissiveness.
There are a number of factors that could contribute to this asynchrony. Nuclear availability of DNMT3A [34] and DNMT3L is an absolute requirement, and DNMT3L is potently up-regulated during oocyte growth. A study in which DNMT3A2 and DNMT3L were precociously expressed in oocytes was not able to induce methylation of imprinted gDMRs in NGOs however, but did advance methylation of some gDMRs in growing oocytes [30], indicating that some loci are in a state permissive for methylation earlier than others.
Having established a major role for transcription in conferring the DNA methylation landscape of the oocyte, including at CGIs [5][6][7], we reasoned that timing of transcription events could influence timing of methylation. In fact, we did not find strong evidence to support this proposition. Despite substantial transcriptional changes during initial stages of oocyte growth, most changes occur well in advance of the onset of methylation, indicating that remodelling of the oocyte transcriptome is not a rate-limiting step in determining permissiveness of individual loci. We did find a positive correlation between expression and methylation however, so more highly expressed gene bodies on average gain methylation earlier than less highly expressed genes; this effect could be mediated through transcription-depending chromatin remodelling, including deposition of H3K36me3, whose levels over gene bodies scale with expression in oocytes [20]. A caveat to our analysis is that we used transcript abundance as measured by RNA-seq as a proxy for transcription, rather than directly determining active transcription events. This is because methods have not been developed to allow nascent transcription (such as by NET-seq) to be mapped in small numbers of cells. However, at the very least, the RNA-seq data allow us to determine the time that genes are first transcribed during oocyte growth.
To explain the difference in onset of methylation at CGIs, we considered the contributions of up to nine variables for which data were available. In combination, these variables explain about a fifth of the variation in timing of methylation establishment, with chromatin factors-H3K4me2 enrichment, KDM1A dependence and KDM1B dependence-having the greatest individual effects. There may be several reasons that we are not able to account for more of the variation at this time, apart from unknown factors not included in the modelling. One reason might be the relative imprecision in some of the data types; for example, low-cell ChIP-seq data for histone modifications in growing oocytes are inherently noisy, being at the limits of the capability of this method, and will have missing values at some CGIs. In comparison, PBAT data from Kdm1a-and Kdm1b-null MII oocytes are likely to be more precise. Therefore, it is reassuring that the magnitude of the individual effects of H3K4me2 enrichment and KDM1B dependence is so similar, since these are likely to be partially dependent variables given that we previously concluded that KDM1B is the major locus-specific H3K4me2 demethylase in oocytes [20]. It was previously suggested that KDM1B may be required to allow methylation of imprinted gDMRs that acquire methylation late in oocyte growth [21]; our genome-wide analysis and modelling partly support this earlier inference.
Gross sequence composition accounts for little of the variation in CGI methylation timing. Although CpG density is a determinant of H3K4me2 enrichment at CGIs, CGIs destined for methylation in oocytes are relatively depleted of H3K4me2 irrespective of CpG density [20]. Several sequence motifs, however, were differentially represented in early-and late-methylating CGIs. Individually, these motifs are not as discriminating as the ZFP57 binding site in imprinted gDMRs [35] that ensures retention of methylation after fertilisation, or the E2F1/E2F2 motif enriched in CGIs that escape DNA methylation in oocytes [31]. When combined, the three motifs for known TFs explain about half as much of the variation in methylation onset as do each of the chromatin factors. These motifs correspond to binding sites for 15 TFs expressed at varying levels in oocytes. Although some of their transcripts are down-regulated during oocyte growth (Additional file 9: Table S8), it is not possible at this stage to conclude whether the dynamics of any of these TFs underlies the differential methylation onset of the CGIs.
By capturing the progression of methylation, we also reveal other important aspects of de novo methylation in an in vivo setting, extending the significance of studies done in models such as ESCs. For example, we identify a co-operativity and nucleosomal pattern of DNMT3A action similar to that observed in ESCs [28]. Non-CpG methylation has been described as a property of oocytes as well as other non-dividing cells [11,29,36], but remains an enigmatic modification. Even in oocytes, in which methylation globally at CHG and CHH sites exceeds that at CG sites, few non-CpG sites are methylated (genomewide average methylation of CHGs is 3.9%, and CHHs are 3.0% compared with 38.7% at CG sites as quantified with our parameters using published data [11]), with sites methylated mostly only to intermediate levels; moreover, CHH/CHG methylation is highly associated with domains of CpG methylation. Combined with its very much later onset, this suggests that CHG/CHH methylation is largely a by-product of sustained DNMT3A activity. Finally, DNA methylated domains not associated with transcribed regions are also late in acquiring methylation, suggesting that they require additional remodelling steps or a distinct mechanism of de novo methylation.

Conclusions
The mammalian oocyte provides an important model to understand DNA methylation mechanisms, because an entire methylation landscape is established de novo in a non-dividing cell. Epigenetic remodelling events culminate in a distinctive DNA methylation landscape, including the programmed methylation of a defined set of CGIs, mostly associated with transcription units. Despite the simplicity of the methylation landscape, various sequence elements are not co-ordinately methylated, with pronounced asynchrony in methylation of CGIs. In this study, we generated methylation and transcriptome data sets to test whether timing of transcription events explained asynchrony of CGI methylation; however, our results do not support transcriptional transitions as a major factor in time of onset of methylation. By incorporating data on chromatin state, TF binding motifs and the effect of deficiency in H3K4 demethylases, we could account for a substantial fraction of variation in CGI methylation timing, suggesting that sequences least dependent on chromatin remodelling are the earliest to become permissive for methylation.

Isolation and size selection of growing oocytes
Oocytes were collected from C57BL/6Babr mice. Ovaries were removed and digested for 30 min at 37 °C in 1× PBS containing 2 mg/ml collagenase (Sigma-Aldrich, C2674) and 0.025% trypsin (Sigma-Aldrich, 93615). M2 medium (Sigma-Aldrich, M7167) was added to dilute the digestion mix, and oocytes were picked up with a mouthcontrolled drawn-out glass pipette. To eliminate contaminating somatic cells, oocytes were washed extensively in clean drops of M2 medium. A stage micrometre was used in combination with an eyepiece reticle to measure sizes of oocytes. Mice of post-natal days p5-7, p7-12, p9-14 and p13-16 were used to collect oocytes of 10-30, 40-45, 50-55 and 60-65 µm in diameter, respectively; GV oocytes were collected at p20.

Generation of PBAT and RRBS libraries
RRBS libraries were generated, in duplicate, from ~450 to 550 oocytes per size-selected population, as previously described [7], but without the gel-extraction step. Briefly, DNA was spiked with a small amount of lambda DNA (0.05 pg per 6 ng genomic DNA) for bisulphite conversion control, digested with MspI (Thermo Fisher Scientific, ER0541), end-repaired (Klenow fragment exo-, Thermo Fisher Scientific, EP0421, with 10 nM dATP, 1 nM dCTP and 1 nM dGTP) and ligated with 5mC-adapters (Illumina) with T4 ligase (Thermo Fisher Scientific, EL0014). Bisulphite conversion was done using the EZ DNA Methylation-Direct Kit (Zymo Research, D5020), and DNA was amplified by 18 cycles of PCR using PfuTurbo Cx Hotstart DNA polymerase (Agilent, 600410). Libraries were purified using SPRI beads (Agencourt, A63880) and sequenced 40 bp single end on an Illumina Genome Analyzer IIx. The PBAT library was constructed from 200 60-65 µm oocytes as previously described [20] and sequenced 100 bp paired end on an Illumina HiSeq 1000.

Generation of strand-specific RNA-seq libraries
Strand-specific RNA-seq libraries were generated as previously described [6] and sequenced 100 bp paired end on an Illumina HiSeq 1000. The numbers of oocytes used per library are listed in Additional file 5: Table S4.

Conventional bisulphite sequencing
Bisulphite sequencing was performed essentially as previously described [29] using DNA from ~100 to 200 oocytes plus 50 ng lambda DNA spike-in for each