Methylome Analysis and Epigenetic Changes Associated with Menarcheal Age

Reproductive factors have been linked to both breast cancer and DNA methylation, suggesting methylation as an important mechanism by which reproductive factors impact on disease risk. However, few studies have investigated the link between reproductive factors and DNA methylation in humans. Genome-wide methylation in peripheral blood lymphocytes of 376 healthy women from the prospective EPIC study was investigated using LUminometric Methylation Assay (LUMA). Also, methylation of 458877 CpG sites was additionally investigated in an independent group of 332 participants of the EPIC-Italy sub-cohort, using the Infinium HumanMethylation 450 BeadChip. Multivariate logistic regression and linear models were used to investigate the association between reproductive risk factors and genome wide and CpG-specific DNA methylation, respectively. Menarcheal age was inversely associated with global DNA methylation as measured with LUMA. For each yearly increase in age at menarche, the risk of having genome wide methylation below median level was increased by 32% (OR:1.32, 95%CI:1.14–1.53). When age at menarche was treated as a categorical variable, there was an inverse dose-response relationship with LUMA methylation levels (OR12–14vs.≤11 yrs:1.78, 95%CI:1.01–3.17 and OR≥15vs.≤11 yrs:4.59, 95%CI:2.04–10.33; P for trend<0.0001). However, average levels of global methylation as measured by the Illumina technology were not significantly associated with menarcheal age. In locus by locus comparative analyses, only one CpG site had significantly different methylation depending on the menarcheal age category examined, but this finding was not replicated by pyrosequencing in an independent data set. This study suggests a link between age at menarche and genome wide DNA methylation, and the difference in results between the two arrays suggests that repetitive element methylation has a role in the association. Epigenetic changes may be modulated by menarcheal age, or the association may be a mirror of other important changes in early life that have a detectable effect on both methylation levels and menarcheal age.


Introduction
In addition to genetic changes, epigenetic changes and particularly DNA methylation can play an important role in the aetiology of chronic diseases such as cancer [1][2][3][4][5]. Gene specific promoter methylation can silence genes involved in critical cellular processes such as cell cycle regulation, DNA repair or apoptosis. At the same time, genome wide hypomethylation and in particular reduced methylation in repetitive elements such as Long Interspersed Nuclear Element-1 (LINE-1) and Alu repeats has been associated with chromosomal instability and mutations leading to chronic disease [1,3,4,6]. Methylation changes are most evident in tissues such as tumour biopsies when compared to normal tissue. However, genome wide methylation changes in relation to disease have been observed in surrogate tissues such as Peripheral Blood Leukocyte (PBL) DNA. Aberrant methylation in PBLs has been previously associated with breast cancer [7,8], colorectal adenoma [9,10], gastric cancer [11], head and neck squamous cell carcinoma [12], and bladder cancer [13]. A recent meta-analysis of all relevant studies has shown that there is overall little evidence to support an association with cancer using surrogate assays [14]. The exception has been one study based on a large population-based case-control study, the Long Island Breast Cancer Study Project, LIBCSP, with over 2,100 peripheral blood samples, which revealed greater global and promoter specific methylation in PBLs of breast cancer cases using LUMA [15]. Although the potential influence of the disease onset on the methylome of blood DNA needs to be tested, these results suggest that methylation in PBLs DNA can serve as a biomarker for chronic diseases such as cancer; it also points to a role of aberrant methylation in carcinogenesis.
Environmental exposures influence epigenetic changes, including methylation levels, particularly in utero and in early life [16,17]. In fact, genomic methylation has been shown to differ with respect to several accepted disease risk factors. These include age, race, anthropometric measures, environmental exposures and dietary factors [18][19][20][21][22]. For example, a prudent dietary pattern characterized by high intake of vegetables and fruit was shown to be associated with a lower prevalence of genomic hypomethylation (17,19). Also, alcohol drinking and low dietary folate were found to impact on genomic DNA methylation -genome wide and gene specific [8,19]. In addition, in a multiethnic birth cohort in New-York City [18], BMI was not found to be associated with DNA methylation, but elsewhere, in women of childbearing age, a higher BMI was associated with lower global methylation [23]. In line with the latter finding, Zhang et al. [22] showed that higher physical activity is associated with higher global methylation in a cancer free population.
Reproductive factors were also shown to impact on global DNA methylation. Terry et al. [18] showed that factors that impact on breast cancer risk, including a greater birth height, a later age at menarche, nulliparity, and a later age at first birth were associated with higher global DNA methylation levels, but these results were not replicated in other studies [8,24]. However, the studies that investigate reproductive factors and epigenetic alterations are few [25].
In the present study we aim to investigate the impact of a number of reproductive variables on DNA methylation in PBLs of healthy individuals. The relationship was first investigated with genomic DNA methylation measurements using LUminometric Methylation Assay (LUMA) in 376 women. LUMA is a cytosine extension assay where the ratio of DNA CpG site cleavage by methylation sensitive restriction endonucleases (HpaII) to the cleavage from methylation insensitive endonucleases (MspI) is used to determine % global methylation. HpaII cleavage occurs most frequently in CpG island promoters and repetitive elements thus methylation at these sites heavily influences the LUMA methylation estimate. Subsequently, to replicate the findings observed with LUMA, whole genome methylation patterns were obtained using Illumina 450 K in an independent group of 332 women. The Illumina 450 K array covers 485,577 CpG sites, achieving a high coverage of the entire genome, excluding repetitive elements.

LUMA Methylation is Associated with Menarcheal Age
Demographic, anthropometric, lifestyle, and reproductive characteristics for subjects included in Stage 1 are presented in Table 1. The median LUMA genome-wide methylation in these subjects was 71.7% and the standard deviation was 5.7%. Of all the anthropometric measures and lifestyle variables examined, only age at menarche was found to significantly differ across quartiles of percent genome wide methylation (Table 2). Higher genome wide methylation was associated with a younger age at menarche (Kruskal-Wallis P-value = 0.002), and this association was significant even after Bonferroni correction for multiple testing ( Table 2). Age at blood collection, height, weight, BMI, physical activity, smoking status, daily alcohol, folate consumption, age at FFTP, menopausal status, parity, breastfeeding, oral contraceptive (OC) use, hormone replacement therapy (HRT) use and highest level of education achieved did not significantly differ between subjects in methylation quartiles ( Table 2).
The association between age at menarche and methylation was further examined using logistic regression to adjust for potential confounders. When median methylation was used as a cut off, 194 subjects had methylation below median levels and 182 had methylation levels above median. Using these two classes, logistic regression showed that age at menarche was significantly associated with class occupancy. As shown in Table 3, for every yearly increase in age at menarche, the risk of having below median methylation was increased by 32% (OR: 1.32, 95%CI: 1.14-1.53). When age at menarche was treated as a categorical

Illumina 450 k Methylome Analysis Identifies an Epi-allele Associated with Menarcheal Age
In the second population group, 329 subjects (out of 332) had available information on age at menarche (Table 1). In contrast to LUMA global methylation, the median genome-wide methylation level using the 450 k ILLUMINA assay did not significantly differ between menarcheal age groups. Similarly, CpG island methylation and promoter methylation were not significantly different between subjects in different menarcheal age categories. However, there was a trend towards decreasing methylation with increasing age at menarche, consistent with the LUMA results ( Figure 1).
When adjusting for case-control status, age, and position on the chip in a linear regression model with methylation M-values as a continuous outcome, and age at menarche as a categorical variable (.11 yrs vs. #11 yrs), age at menarche was significantly associated with methylation in a single CpG site (cg01339004), located on the body of the SMAD6 gene (p,1.00610 27 , genomewide level significance) (Table 4, Figure 2). When only those subjects that remained healthy for at least 5 years following recruitment and blood collection were analysed, the same CpG site was found to be significantly associated with age at menarche (p = 6.71610 28 ).
However, using bisulphite Pyrosequencing for the SMAD6 cg01339004 locus, we were unable to replicate this finding in an independent sample set using a generalized linear model while adjusting for the same confounders (n = 185, p = 0.07). Wilcoxon rank sum non-parametric test also did not reveal significantly differential methylation between the two age at menarche categories (p = 0.082) measured using bisuphite pyrosequencing ( Figure 2).

Discussion
In this study, age at menarche was negatively associated with LUMA genome wide methylation in a statistically significant manner. The association of genome wide methylation with menarcheal age was the only strong and consistent association we found and remained unaltered after adjustment for relevant confounders. Previous study results on age at menarche and methylation were conflicting. Terry et al. [18] found that a later age at menarche was associated with higher genomic global methylation later on in adulthood, but DNA methylation was only assessed in 92 individuals and the authors used a different technique for measuring global methylation ([ 3 H]-methyl acceptance assay). On the other hand, Choi et al. [8] did not demonstrate a statistically significant association between menarcheal age and global DNA methylation using LINE1 methylation as a surrogate for global methylation.
The negative association between LUMA methylation and later age at menarche is counter-intuitive because (a) a later age at menarche is known to protect from breast cancer, and (b) lower global methylation is expected to increase genome instability and thus increase cancer risk [8,26]. However, our observation is consistent with the findings in the LIBCSP study, where breast cancer was associated with increased genome wide methylation as measured with LUMA [15]. This apparent paradox could be explained by LUMA's characteristics, i.e. broad coverage in CpG dense regions, such as promoters, and decreased coverage in the remaining genome [27]. Another potential explanation is that aberrant methylation associated with age at menarche is unrelated to the methylation changes relevant to breast cancer, or that the association with age at menarche is in fact confounded by other determinants of methylation levels.
Given the conflicting reports in the literature [8,15,18], we aimed to replicate, in a dataset with whole genome methylation data, the association between age at menarche and DNA methylation that we observed with the LUMA technology. This was done by using the robust Illumina technology. This approach also enabled the identification of specific genes that might be involved in the mechanistic pathways linking menarcheal age with disease. In contrast to the findings of LUMA, genome wide methylation in this second dataset did not significantly differ between subjects in different menarcheal age groups. However, there was a non-significant trend towards decreasing methylation with increasing age at menarche, consistent with the LUMA findings ( Figure 1). The lack of association in this dataset could be caused by differences in coverage between the two assays used. LUMA assesses methylation of a specific restriction enzyme site (HpaII, CCGG), which occurs most frequently in CpG island promoters -also covered by the 450 K array -but also in repetitive elements. However, the Infinium HumanMethylation 450 BeadChip, due to its probe design, does not interrogate  repetitive elements which are found to be differentially methylated in many cases of neoplasia [28,29]. For example, satellite and SINE repeats were found to be enriched with hypomethylated Differentially Methylated Regions (DMRs) whereas LINE was enriched with hypermethylated DMRs in malignant peripheral nerve sheath tumours compared to normal Schwann cells [30]. If age at menarche is related to methylation patterns in these repetitive elements, the hypomethylation would not have been evident in the 450 K chip but it would have been detected in the LUMA assay. This suggests that the LUMA based association is being driven largely by methylation differences in repetitive elements, where age at menarche could have a greater effect. In the locus by locus analysis, methylation of a single CpG site was shown to be associated with age at menarche. However, in an independent sample set, this finding was not replicated. Given the multiple comparisons in the locus by locus analyses in the Illumina dataset, one cannot rule out the possibility that this finding is the result of chance, and given that the independent sample set did not replicate this finding using an alternative method, we conclude that it is likely to be a false positive association. However, further validation in further independent data sets, with a greater sample size may increase the power sufficiently to detect possible associations between methylation of individual loci and age at menarche.
The mechanistic link which could explain the association between menarcheal age and genome-wide DNA methylation, but not in individual CpG loci is yet to be determined. However, endogenous oestrogen exposure is a strong candidate for epigenetic changes since an earlier age at menarche exposes a woman to a greater cumulative amount of endogenous oestrogens  over her lifetime, and there have been reports which show that oestrogen impacts on DNA methylation. More specifically it was shown that oestrogen receptor (ER) positive breast tumour tissues have differential methylation at several CpG loci compared to ER negative tumours [24,31] and oestrogen induced breast tumours have differential DNA methylation patterns in ACI rat mammary gland tissue [32]. Further investigation into the role of oestrogen on repetitive element methylation is, therefore, warranted. It is also possible that age at menarche is an indirect indicator of other macroscopic changes that may impact on DNA methylation.  It is widely accepted that development is plastic and the sensitivity of the epigenetic system to environmental factors is heightened during periods of developmental plasticity such as childhood, adolescence and puberty [17]. Epigenetic modifications in response to environmental exposures at these critical periods are often subtle initially and even though they do not lead to phenotypic changes at the time of exposure, they may lead to increased risk of dysfunction and disease later on in life [17]. Trends in the past decades show a rapid shift towards an earlier age at menarche and this is more pronounced in developed countries [33]. This trend is too steep to be attributed to genetic changes. Instead, environmental exposures at the periods of developmental plasticity are likely to be the cause of the dramatic decrease in age at menarche. For example, childhood obesity disrupts the hormonal milieu leading to an increase in adipocyte secreted leptin, or in adrenal secreted androgens, all of which impact on menarcheal onset [34]. Pre or neo-natal nutrition as well as early life exposure to endocrine disrupting chemicals (EDCs) can also lead to hormonal imbalances impacting on age at menarche [17]. Given that these exposures occur at the periods when the epigenetic signature is more plastic, they might also lead to aberrant DNA methylation changes which will be inherited during cell divisions and be detectable years later. Therefore, the aberrant DNA methylation pattern observed in adulthood might not be related to menarcheal age per se but to an early life environmental exposure, like diet, that impacts both on age at menarche and on DNA methylation.
This study suggests an association between age at menarche and DNA methylation. All samples in this study were collected prior to the onset of disease, and the changes observed were present in the blood of individuals when they were still healthy, at least five years prior to their diagnosis, limiting the potential influence of the presence of cancer (reverse causality) on the methylome of blood DNA. In addition, the sample sizes examined -376 subjects for LUMA and 332 subjects for Illumina 450 K Methylation are fairly large datasets, allowing for sufficient power to detect significant methylation changes if present. However, one important limitation of our study was the lack of information on other early life exposures, therefore it was impossible to investigate whether such exposures confound the observed association between age at menarche and DNA methylation. Thus, this hypothesis needs to be further investigated in birth cohorts.
Overall, our results suggest that DNA methylation changes, particularly in repetitive elements, may be associated with menarcheal age. However, it is also possible that some important changes taking place in early life and which are associated with age at menarche -in particular nutrition -have a detectable effect on methylation levels.

Stage 1: Genome Wide Methylation with LUminometric Methylation Assay (LUMA)
Study participants. All participants signed an informed consent and the study protocol was approved by the Ethics Committee of the International Agency for Research on Cancer.
Epidemiologic data and blood samples collected from the European Prospective Investigation into Cancer and Nutrition (EPIC) were used. EPIC is an ongoing study designed to investigate diet, nutrition, lifestyle and environmental factors with respect to cancer incidence. The cohort consists of 519,978 participants from 23 centres in 10 European countries -Denmark, France, Germany, Greece, Italy, the Netherlands, Norway, Spain, Sweden and the United Kingdom. Information on lifestyle, diet, anthropometric measures and environmental exposures were collected using questionnaires at recruitment and were standardized across the different participating centres. Blood was also collected from the majority of subjects at recruitment [35]. For the LUMA investigation, 600 individuals -half breast cancer cases and half controls -from the EPIC cohort were chosen. Of these, 77 subjects were initially excluded: 1 subject was a duplicate, there was not enough DNA for 24 subjects, and 52 samples produced no or a weak signal. Of the remaining 523 subjects with reliable measurements, we investigated 376 women in this specific study, who remained free of cancer for at least 5 years following blood collection.
Genome wide DNA methylation. LUMA was used to quantify genome wide methylation levels [27,36] in PBLs in the blood of subjects, collected at recruitment. Genomic DNA was extracted using standard protocols. LUMA gives a measure of % global methylation using the ratio of DNA cleavage by methylation sensitive (HpaII) and methylation insensitive (MspI) restriction enzymes. In LUMA, polymerase extension assay by Pyrosequencing is employed to determine cleavage. The LUMA method was validated using DNA controls of known DNA methylation status [37]. In the assay, 5-Aza-dC treated and CpG methylated Jurkat genomic DNA (New England Biolabs, Ipswich, MA) were used as methylated and unmethylated control samples. Genome wide methylation is expressed as a percentage obtained from the equation [37]: Statistical analyses. We first compared the distribution of a number of anthropometric measures, reproductive factors, and lifestyle characteristics such as age at blood collection, height, weight, parity, age at first full term pregnancy, breastfeeding and hormone use across quartiles of percent global methylation. The quartile cut offs were the 25 th , 50 th , and 75 th percentile methylation values in controls. For continuous variables, the non-parametric equivalent of one-way analysis of variance (ANOVA), the Kruskal-Wallis test, was used as genome wide methylation was not normally distributed. For categorical variables, chi-square test was used.
The reproductive variables that were statistically differentially distributed between methylation quartiles were investigated further. The resulting profile of genome wide methylation distribution was skewed and several transformations failed to normalize it. In addition, various GLM models investigated failed to adequately describe the outcome distribution. As a result, the methylation outcome was dichotomized -above and below median methylation -and unconditional logistic regression was used to evaluate the association between exposure variables and DNA methylation, the latter being the dependent variable. All significant variables were included both as continuous and categorical when relevant (e.g. age at menarche #11 y, 12-14 y, $15 y). Based on the available literature, the logistic regression model was fully adjusted for centre, plate number, age at blood collection, height, weight, total physical activity, smoking status, daily alcohol consumption, and daily folate consumption. All confounders were entered into the model as continuous variables with the exception of smoking status which was treated as categorical -past, never, present.
All analyses were performed using STATA (Release 11; College Station, TX: StataCorp LP).
Stage 2: Locus-by-locus Analysis with Illumina 450 K to Replicate the Findings of LUMA Genome Wide Methylation Analysis Study participants. All participants signed an informed consent and the study protocol was approved by the ethics committee of the Human Genetics Foundation (HuGeF).
The EPIC Italy sub-cohort consists of 32,578 female subjects recruited from 5 different centers -Varese, Turin, Florence, Naples, and Ragusa. From this subcohort, 166 breast cancer cases and 166 controls, matched on date of birth (65 years), seasonality of blood draw, and date of recruitment were selected. However, since for this investigation case/control status is not the outcome and since at the time of blood collection all individuals were healthy, all 332 blood samples were treated as healthy blood. Nevertheless, given the long latency period of neoplasia, analyses were carried out on all subjects with age at menarche information (n = 329) as well as on only the subjects that remained healthy at least 5 years following recruitment and blood collection (n = 240).
Illumina 450 K methylation. DNA was extracted from buffy coats or blood cell fractions using the QIAsymphony DNA Midi Kit (Qiagen, Crawley, UK). 500 ng of DNA was bisulphiteconverted with the EZ-96 DNA Methylation-Gold TM Kit, used according to the manufacturer's protocol (Zymo Research, Orange, CA, USA). Next, the 450 K DNA methylation array by Illumina (Infinium HumanMethylation 450 BeadChip) was performed on 4 ml of bisulphite-converted DNA, following the Illumina Infinium HD Methylation protocol. This array includes 485,577 cytosine positions of the human genome (482,421 CpG sites (99.4%), 3091 non-CpG sites and 65 random SNPs; hereafter the term CpG will be used to refer to all of these, unless otherwise specified). Briefly, a whole genome amplification step was followed by enzymatic end-point fragmentation and hybridization to HumanMethylation 450 BeadChips at 48uC for 17 h, followed by single nucleotide extension. The incorporated nucleotides were labelled with biotin (ddCTP and ddGTP) and 2,4-dinitrophenol (DNP) (ddATP and ddTTP). After the extension step and staining, the BeadChip was washed and scanned using the Illumina HiScan SQ scanner. The intensities of the images were extracted using the GenomeStudio (v.2011.1) Methylation module (1.9.0) software, which normalizes within-sample data using different internal controls that are present on the HumanMethylation 450 BeadChip and internal background probes. The Infinium HumanMethylation 450 BeadChip data, for subjects with age at menarche information, were made available on the data repository Gene Expression Omnibus (GEO), with accession number GSE51057.
Statistical analyses. Methylated and unmethylated intensities for each probe were provided by GenomeStudio software. In addition to the methylation value for each probe, corresponding detection p-values were also provided by GenomeStudio software (Illumina). The detection values indicate the confidence that can be placed on a b-value reading. As a first step, all readings with a p-value.0.05 were considered as non-detected so as to not influence downstream pre-processing and analyses.
Background noise correction was then performed as background fluorescence can contribute an additive error to each signal intensity leading to a reduced dynamic range for the methylation reading. Given that signal intensities can be red or green, and given the technical variation in fluorescent signal depending on the intensity colour, dye bias also had to be taken into account using the method described by Triche et al. [38]. The analysis of other classical quality control measures (such as staining, extension, hybridization, or bisulphite conversion) provided by GenomeStudio did not reveal any major quality issues.
The methylated and unmethylated intensities provided by GenomeStudio were used to calculate methylation b-values based on the equation: where M is the intensity of the methylated signal and U the intensity of the unmethylated signal at each probe.
Beta values were later peak based corrected (PBC) as suggested by Dedeurwaerder et al. [39] in order to correct for the bias arising from the two different probe designs on the array. In order to correct for batch effects, COMBAT was then used [40,41]. Lastly, missing data were imputed using KNN (k-nearest neighbours) method once, implemented in knn.impute function from R-CRAN.
In order to replicate the results observed with LUMA, methylation beta values across all probes were averaged per individual to derive a measure of genome-wide methylation per subject. Similarly, probes in CpG islands and promoter regions were averaged per subject. Wilcoxon-rank sum tests were performed to examine whether genome-wide, CpG island and promoter methylation was significantly different between subjects in the three menarcheal age groups examined with LUMA (#11 y, 12-14 y, $15 y).
In addition, locus by locus analysis of methylation was performed using a linear regression model. Quantile normalization was performed prior to regression using the R package ''preprocessCore'' from Bioconductor. In order to satisfy the normality assumptions of a linear regression model, beta methylation values were converted to M-values as described in [42] and entered into the model as a continuous outcome. Mvalues were also peak based corrected and COMBAT adjusted for chip number to correct for batch effects. Only age at menarche, the only significant reproductive variable in LUMA analysis, was investigated here. Age at menarche was treated as a categorical outcome (#11 yrs vs. .11 yrs). This categorization was chosen since in the LUMA data, both menarcheal age categories above 11 yrs old were significantly associated with methylation when compared to a menarcheal age of #11 years.
Q-values, measuring the maximum False Discovery Rate (FDR) from the Benjamini and Hochberg method were derived for each probe analysis, and overall Type I error was controlled for by conservatively applying Bonferroni multiple testing correction. The per-test significance cut-off value was set to 1.00610 27 . The analysis was performed first on all 329 subjects with age at menarche information and then repeated only on the 240 subjects that remained healthy for at least five years following recruitment. The linear model was adjusted for age, case-control status and chip position. All confounders were entered into the model as continuous variables with the exception of chip position.
Stage 3: Single Locus (SMAD6, cg01339004) Pyrosequencing Analysis to Replicate the Findings of Locus by locus Illumina 450 K Analysis Study participants. The participants used in this stage were also subjects of the EPIC Italy sub-cohort. One hundred ninetyfive women, with available information on age at menarche, were selected for bisulphite pyrosequencing based on sample availability from other studies. Eight of these subjects overlapped with the subjects used in Stage 2.
Statistical analyses. Out of the 195 samples analysed, for 10 samples the pyrosequencing results did not pass quality check. Wilcoxon rank sum non-parametric test was performed to examine whether SMAD6 cg01339004 methylation was different between age at menarche categories (#11 vs. .11 years). In addition, a generalized linear regression model with SMAD6 cg01339004 methylation as a continuous outcome and with age at menarche as a categorical variable (#11 vs. .11 years) was run to correct for confounding variables -age at blood collection and case-control status -as in the case of the Illumina 450 K analysis.