White blood cell DNA methylation and risk of breast cancer in the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial (PLCO)

Background Several studies have suggested that global DNA methylation in circulating white blood cells (WBC) is associated with breast cancer risk. Methods To address conflicting results and concerns that the findings for WBC DNA methylation in some prior studies may reflect disease effects, we evaluated the relationship between global levels of WBC DNA methylation in white blood cells and breast cancer risk in a case-control study nested within the Prostate, Lung, Colorectal and Ovarian Cancer Screening Trial (PLCO) cohort. A total of 428 invasive breast cancer cases and 419 controls, frequency matched on age at entry (55–59, 60–64, 65–69, ≥70 years), year of entry (on/before September 30, 1997, on/after October 1, 1997) and period of DNA extraction (previously extracted, newly extracted) were included. The ratio of 5-methyl-2’ deoxycytidine [5-mdC] to 2’-deoxyguanine [dG], assuming [dG] = [5-mdC] + [2’-deoxycytidine [dC]] (%5-mdC), was determined by liquid chromatography-electrospray ionization-tandem mass spectrometry, an especially accurate method for assessing total genomic DNA methylation. Results Odds ratio (OR) estimates and 95% confidence intervals (CI) for breast cancer risk adjusted for age at entry, year of entry, and period of DNA extraction, were 1.0 (referent), 0.89 (95% CI, 0.6–1.3), 0.88 (95% CI, 0.6–1.3), and 0.84 (95% CI, 0.6–1.2) for women in the highest compared to lowest quartile levels of %5md-C (p for trend = .39). Effects did not meaningfully vary by time elapsed from WBC collection to diagnosis. Discussion These results do not support the hypothesis that global DNA hypomethylation in WBC DNA is associated with increased breast cancer risk prior to the appearance of clinical disease.


Background
Cytosines can be methylated in the mammalian genome, primarily at CpG sites. CpG sites are located throughout the genome, including gene promoter regions and repetitive DNA sequences. DNA methylation patterns play a key role in gene expression and cell integrity. For example, genome-wide hypomethylation can be associated with chromosomal instability and the expression of oncogenes or repetitive sequences that are normally silenced by methylation [1]. Global loss of DNA methylation is characteristic of cancer tissue [2].
There are several different methods to assess global DNA methylation. Measurement of the ratio of 5-methyl-2' deoxycytidine [5-mdC] to 2'-deoxyguanine [dG], assuming [dG] = [5-mdC] + [2'-deoxycytidine [dC]] (%5-mdC), by liquid chromatography-electrospray ionization-tandem mass spectrometry (LC-ESI-MS/MS) provides a comprehensive measure of genome-wide DNA methylation levels [3]. This method is considered the gold standard as it evaluates the entire genome but is expensive and time-consuming, and requires specialized laboratory equipment. Approximately one-third of the DNA methylation in the genome occurs in the repetitive sequences of the genome, including LINE-1 and Alu [4]. For this reason, DNA methylation levels in LINE-1 or Alu repeats, which can be obtained by high-throughput methods, have often been used as surrogate markers of global DNA methylation [5]. Other high-throughput surrogate methods for estimating global methylation are available. For example, the luminometric methylation assay (LUMA) [6] estimates global DNA methylation levels by using restriction enzymes specific for methylated and unmethylated CCGG, a sequence found throughout the genome. A different approach has been to average methylation levels across the limited set of individual CpG sites represented on the Illumina human methylation bead kits [7,8]. An advantage of this latter approach, relative to aggregrate methods of assessing global methylation (e.g., %5mdC, LINE-1), is that one can adjust for potential differences in blood cell composition between cases and controls in archived blood specimens [9]. A second advantage is that it is possible to conduct subanalyses to examine methylation in specific locations in the genome (e.g., promoter regions across the genome) [7,8].
Global DNA hypomethylation of breast tumor tissue is well-established [2]; however, there is some evidence that global hypomethylation in circulating white blood cell (WBC) DNA may also be associated with an increased risk of breast cancer. One possible explanation is that the association represents unidentified environmental and lifestyle determinants that influence both global methylation and breast cancer risk. An alternative possibility is that, in response to very early, preclinical breast cancer, a new clone of circulating leukocytes arises that alters white blood cell DNA methylation [10]. In a relatively small retrospective case-control study, Choi and colleagues [11] observed a nearly threefold increase in breast cancer risk among women in the lowest tertile of %5-mdC in WBC DNA compared to women in the highest tertile. Comparable results were obtained prospectively in the case-cohort study nested in the NIEHS Sister Cohort Study [12], with a nearly twofold increased risk observed among women in the lowest quartile of LINE-1 WBC DNA methylation compared with those in the highest quartile. By contrast, a number of other investigations, including three separate nested case-control studies from Europe, which used pre-diagnostic DNA (and were presented in a single publication) [13], observed no association between LINE-1 methylation in WBC DNA and breast cancer risk [11,[13][14][15][16][17][18] or between Alu methylation in WBC DNA and breast cancer risk [14,16,19].
Findings from the three retrospective case-control studies employing the LUMA assay were inconsistent, with positive, inverse, and null associations [15,20,21]. However, three of the four nested case-control studies, which used the Illumina HumanMethylation450 BeadChip on prediagnostic DNA (separate results from three cohorts were reported in a single paper [8]) observed that global hypomethylation was positively associated with increased breast cancer risk [7,8].
To address these discrepant results, we examined the association between global hypomethylation in WBC DNA and subsequent breast cancer incidence in a study nested in the Prostate, Lung, Colorectal and Ovarian Cancer Screening (PLCO) Trial cohort. We measured WBC DNA %5-mdC levels by LC-ESI-MS/MS because: (1) WBC global hypomethylation as measured by %5-mdC level was found to be strongly and significantly associated with increased breast cancer risk in the one breast cancer study that measured %5-mdC, which was a relatively small restrospective study [11]; (2) global hypomethylation as measured by WBC DNA %5-mdC was reported to be more strongly associated with overall cancer risk than surrogate measures of global methylation (LINE-1, Alu, and LUMA) in a recent meta-analysis [22]; and (3) %5-mdC level measured by LC-ESI-MS/MS is considered the gold standard assay for accurately assessing methylation across the entire genome.
Our study is important because it is the first investigation to examine the association between WBC DNA %5-mdC levels, measured prior to breast cancer diagnosis, and subsequent breast cancer incidence. In this large study of 428 cases and 419 controls, the elapsed time between blood collection and breast cancer development ranged from 1.0 to 9.5 years, enabling us to examine whether risk varied by time elapsed between blood collection and diagnosis.

Selection of study subjects
Cases and controls for the present analysis were selected from the Etiology and Early Marker Study (EEMS) breast cancer case-control study that was established from the 39,115 women randomized to the intervention arm of the PLCO screening trial [23]. Through June 30, 2005, a total of 1141 eligible cases of breast cancer were identified. A total of 1141 controls were frequency matched to cases by randomly sub-sampling women who had not been diagnosed with breast cancer by June 30, 2005 in eight strata defined by four age categories (55-59, 60-64, 65-69, ≥70 years) and time of entry into the study (on/before September 30, 1997, on/after October 1, 1997).
A total of 732 cases and 928 controls were initially identified as eligible for the present analysis, after further excluding in the following order: subjects who did not give permission for genetic studies (cases = 32, controls = 25), subjects who had a personal history of any cancer prior to the trial (cases = 40, controls = 30), subjects with unconfirmed, erroneous or in situ breast cancer (cases = 229, controls = 13), subjects who developed other types of cancer anytime during follow up (cases = 105, controls = 130), and other reasons (cases = 3, controls = 14).
A total of 649 out of the 726 cases and 787 out of the 928 controls either had: (1) DNA already extracted from buffy coat/whole blood remaining from a prior study; or (2) buffy coat available for extraction. We further excluded 83 breast cancer cases in which diagnosis occurred within one year of the DNA collection, to minimize the likelihood of disease effects, leaving an eligible pool of 566 cases and 787 controls.
For efficiency, our a priori plan was to select 430 cases and 430 controls for this analysis. We first prioritized case selection to include the 151 cases and 147 controls that already had DNA extracted as part of another study [24]. We then supplemented study subject selection to include cases and controls with buffy coat available for DNA extraction for this study. Controls were frequency matched to cases on age, calendar year of entry, and the date of the DNA extraction (already extracted DNA or newly extracted DNA). Matching on date of DNA extraction was done to address the possible concern that DNA methylation patterns may be affected by the timing or method of DNA extraction. We ultimately selected 428 breast cancer cases and 420 controls that had suitable DNA for analysis, slightly less than our goal because some of the subjects we originally selected turned out to have inadequate DNA. One additional control subject was later excluded for an improbable value for %5-mdC (67.4%). Thus, our final analysis consisted of 428 cases and 419 controls. The institutional review boards of the National Cancer Institute, the 10 participating study centers, and the University of Massachusetts Amherst approved this study. Informed consent was obtained from all participants at study enrollment.

DNA extraction
For 298 study subjects, DNA was previously extracted in 2006-2007 from stored buffy coat (98%) or whole blood (2%). About 90% of this DNA was extracted using the Autopure method (Qiagen) and the remaining specimens were extracted using standard phenol/chloroform extraction. In 2014, DNA was extracted for the remaining study subjects using the QIAsymphony SP automated extraction robot (Qiagen). DNA concentrations were quantified using the Picogreen assay and nanodrop technology.

DNA hydrolysis
To provide individual nucleosides for subsequent total methylated cytosine measurements, genomic DNA were hydrolyzed with DNA Degradase Plus (Zymo Research, Cat # E2021) following the manufacturer's protocol with minor adjustments. Briefly, 400 ng of genomic DNA was incubated with 5 U of DNA Degradase Plus in 25 μl total reaction volumes at 37°C for 2 hours. Batch control DNA included female genomic DNA (Promega, Cat # G1521), which was considered to be "normally" methylated. Complete DNA hydrolysis of additional control samples were verified by agarose gel electrophoresis. Hydrolyzed DNA samples were stored at −20°C until %5-mdC analyses.

Measurement of %5-mdC
The %5-mdC levels were determined by LC-ESI-MS/MS after hydrolysis of DNA as described by Song and colleagues [3], with modifications. Both 5-mdC and dG concentrations were quantified with internal standard additions of isotope-labeled 5-mdC and dG (5-mdC-d3 and 15 N5-dG, respectively; obtained from Toronto Research Chemicals). LC separation was performed on an Acquity UPLC system (Waters Corporation) at a flow rate of 400 μL/min. Methanol containing 0.1% formic acid and water containing 0.1% formic acid were used as buffers. The organic buffer ratio was increased at a linear gradient from 0 to 22.5% over 8 min for the elution of nucleosides. The sample injection volume was 20 μL. Detection by ESI-MS/MS was performed on a Quattro Premier XE Mass Spectrometer (Waters Corporation) following LC separation. The following optimized conditions for ESI positive ion mode were used: source temperature, 120°C; desolvation gas flow, 700 L/h; cone gas flow, 50 L/h; capillary voltage, 4.2 kV; cone voltage, 10 V; extractor voltage, 2 V; entrance potential voltage, 0 V; collision energy, 11 V; and collision cell exit potential, 2.0 V. Multiple reaction monitoring mode was utilized for the quantification of native and labeled nucleosides. The transition pairs of molecular and fragment ions monitored were m/z 242.0/126.0 for 5-mdC, m/z 245.0/129.0 for 5-mdC-d3, m/z 268.1/152.0 for dG, and m/z 273.0/157.0 for 15 N5-dG with a scan time of 150 minutes for each pair. Following QuanLynx (Waters Corporation) analysis for chromatographic peak detection, the resulting peak areas of the native nucleosides were normalized to the labeled internal standards and quantified based on external calibration curves. The 5-mdC and dG nucleosides for external calibration were obtained from Fisher Scientific. We took the average of two injections from each sample vial. Results are reported as the ratio between 5-mdC and dG, assuming that [dG] = [5-mdC] + [dC]. Laboratory personnel were blinded to case status.
Samples were run on twelve plates, with cases and controls distributed on each plate approximately evenly within eight stratum defined by age and time of entry. Each plate had DNA that was either all pre-extracted or all newly extracted. On each plate, we also included three replicate DNA specimens from three women who were in a study site for the PLCO trial that was later dropped (newly extracted DNA). The mean inter-batch coefficient of variation across the twelve plates for each of the three women was 8, 7, and 9%, respectively. The female genomic DNA average inter-batch coefficient of variation was comparable at 9.2%.
In Fig. 1, we show levels of %5-mdC for study subjects in each of the twelve batches and also separately for cases and controls in each batch. Levels of %5-mdC varied by batch and showed limited variation across individuals within batch.

Breast cancer ascertainment
Breast and other cancers were primarily identified through an annual study update mailed to participants, which established cancer diagnosis in the previous year, including type and date [25]. Non-respondents were contacted by mail and telephone. In order to confirm the self-reported cancers, medical records (for standardized medical record abstraction of pathology reports) were retrieved and usually obtained within 2 years of self-report. Cancers were also identified through death certificates, data obtained from state cancer registries, and information from next-ofkin for deceased participants.

Ascertainment of other variables
Demographic information, medical history, and healthrelated behavior were obtained through baseline questionnaires completed by study participants at or around the time of randomization.

Statistical analysis
Logistic regression was used to estimate odds ratios and 95% CI, for breast cancer with case/control status as the outcome and a categorical variable denoting quartiles of the %5-mdC levels as the primary exposure of interest. Because of the evidence of inter-batch variation, we created %5-mdc quartile levels separately for each batch and then created a new summary variable with four levels based on the batch-specific cut points. For example, individuals in the lowest category of the summary variable included individuals from each of the twelve batches who were placed in the lowest batch-specific quartile ranking. Models were adjusted for matching variables, including age in four categories (55-59, 60-64, 65-69, ≥70 years), time of entry into the study (on/before September 30, 1997, on/after October 1, 1997), and period of DNA extraction (previously extracted, newly extracted). Additional models considered adjustment for established or suspected breast cancer risk factors, including race, body mass index, age at menarche, age at first live birth, number of children, type of menopause, age at natural menopause, personal history of benign breast disease, cigarette smoking, recent alcohol intake, family history of breast cancer, and menopausal hormone use. Statistical significance was assessed for each level of the primary exposure variable using two-sided Wald hypothesis tests.

Results
Relative risks for established breast cancer risk factors were generally comparable in our analytic subgroup of cases and controls to those reported previously in the literature [25] (Table 1). Nulliparity, later age at first birth, late age at natural menopause, a personal history of benign breast disease, a family history of breast cancer, and alcohol consumption were associated with increased breast cancer risk. Late age at menarche, three or more live births and surgical menopause were associated with reduced breast cancer risk.
The batch-standardized quartile distribution of %5-mdC was unrelated to the distribution of breast cancer risk factors or study design matching factors ( Table 2).
In further analyses, we stratified on period of DNA extraction (Table 4). Among the group with previously extracted DNA, we observed a nonsignificant increase (OR = 1.44, 95% CI, 0.8-2.7) in risk in the lowest quartile of %5-mdC compared to those in highest quartile. In the group with newly extracted DNA, however, there was an unexpected decreasing trend in the ORs from the highest to lowest quartile of %5-mdC (p for trend = .05). Risk was also significantly decreased in the lowest quartile of %5-mdC compared to those in the highest quartile (OR = 0.61, 95% CI, 0.40-1.0).
We further stratified on both period of DNA extraction and elapsed time from blood collection to diagnosis. Inherent to the study design, all of the breast cancer cases diagnosed within 1-< 2 years of blood collection had previously extracted DNA. When we restricted the analysis to study subjects with previously extracted DNA, we observed nonsignificant slight increases in risk in the lowest quartile level of %5-mdC in all three categories of years since blood collection (i.e. 1-< 2, 2-< 4, ≥4 years). When we restricted analysis to study subjects with newly extracted DNA, however, we found no evidence of any increased risk in women in either the 2-< 4, or ≥4 years since blood collection.

Discussion
Overall, we found no evidence that lower levels of %5-mdC in white blood cell DNA were associated with increased breast cancer risk in a case-control study nested in the PLCO cohort. The %5mdC assay provides a comprehensive measure of genome-wide DNA methylation and is considered the gold standard for accuracy. Odds ratio (OR) estimates and 95% confidence intervals (CI) for breast cancer risk adjusted for age at entry, year of entry, and period of DNA extraction, were 1.0 (referent), 0.89 (95% CI, 0.6-1.3), 0.88 (95% CI, 0.6-1.3), and 0.84 (95% CI, 0.6-1.2) for women in the highest compared to lowest quartile levels of %5md-C (p for trend = .39). There was some variability in our results depending on whether the DNA was previously or newly extracted. We observed a nonsignificant increased risk in the lowest quartile of %5mdC in the subset of women with previously extracted DNA, whereas risk was significantly decreased in the lowest quartile of %5mdC in the subset of women with newly extracted DNA. One possibility is that this difference in the results by period of DNA extraction is the result of technical issues or sample degradation. Conceivably, the earlier method of extraction or shorter buffy coat storage of the previously extracted samples may have resulted in less non-differential misclassification in our methylation measure. In a recent reliability study of methylation measures from mononuclear cells using the HumanMethylation450K Bead Array, differences in DNA extraction methods (and possible differences in cell composition resulting from them) were suggested to have contributed to lower observed reliability for repeated samples across studies than for technical replicates within a study [26]. Given that our findings that previously and newly extracted DNA were in opposing directions and that there was a larger sample size of newly extracted DNA, chance due to small numbers after stratification is a likely explanation for the observed variability.
Our results are not consistent with those of the one other breast cancer study to measure global DNA hypomethylation with %5-mdC, a small case-control study in which WBC DNA was collected after breast cancer diagnosis [11]. In that study, risk of breast cancer was nearly three times higher among those with in the lowest tertile of %5-mdC compared to those in the highest tertile. A major strength of our study is that we purposely included only breast cancer cases in which DNA was collected at least 12 months prior to diagnosis. We found no evidence that risk varied according to time elapsed between blood collection and breast cancer diagnosis.
As well-summarized in a recent systematic review [27], the weight of evidence also does not support an association between breast cancer risk and WBC DNA methylation, measured in other studies by surrogate marker methods, such as Alu, LINE-1, or LUMA. Of seven retrospective case-control studies [11,[14][15][16][17][18][19] and four studies with prospectively collected pre-diagnostic WBC DNA (results from three cohorts were presented in a single publication) [12,13] that measured global methylation by Alu and LINE-1 methylation, only one [12] observed a significantly higher risk of breast cancer among those who had lower LINE-1 methylation levels. Three retrospective case-control studies have examined the relation between WBC DNA global methylation levels and breast cancer risk using LUMA, with inconsistent results [15,20,21]. As suggested by Brennan and colleages [22], LINE-1 and other surrogate assays are likely not sufficiently sensitive to detect slight interindividual differences in WBC DNA methylation. Indeed, Brennan and colleagues reported that the population variability in WBC DNA LINE-1 methylation measured by pyrosequencing in prospectively collected blood did not statistically exceed that of technical duplicates [13]. Further, Tang and colleagues [27] noted that the findings from WBC DNA methylation studies that have evaluated LINE-1 and breast cancer have been null with one exception, despite using different methods of detection (e.g., combined bisulfite restriction analysis, pyrosequencing, MethylLight). Interestingly, a particular strength noted of the prospective study that detected a statistical association between WBC DNA LINE-1 hypomethylation and breast cancer risk was that it employed three independent bisulfite conversions, PCR, and pyrosequencing reactions on each sample [12]. In our prospectively collected blood specimens, we also observed limited population variability in %5mdC levels in WBC DNA, adding to concern that even a small amount of laboratory error is problematic in studies that involve quantification of global WBC DNA methylation from healthy individuals.
Several recent studies have estimated global methylation by averaging individual CpG site-specific methylation levels over the hundreds of thousands of CpG sites on the Illumina HumanMethylation450 BeadChip [7,8]. This alternative approach measures less than 5% of the 28 million CpG sites in the genome [28]. In two prospective cohort analyses that used this approach to measure WBC DNA methylation, women in the highest quartile of methylation had about a 50% decrease in risk of breast cancer compared to women in the lowest quartile of methylation [7,8]. Another cohort analysis, which used pooled samples and nextgeneration sequencing of the overlapping CpG sites from the Illumina HumanMethylation450 BeadChip, also found higher levels of genome-wide methylation in controls than in cases [7]. However, findings from a fourth cohort study were null [7]. Subanalyses based on genomic region have been inconsistent. One study found that WBC epigenomewide methylation in the promoter region was associated with an increase in breast cancer risk whereas epigenomewide methylation outside the promoter region was associated with a decreased risk of breast cancer [8]. A second study confirmed that WBC epigenome-wide methylation in gene bodies was associated with decreased risk but was unable to replicate the increase in risk with promoter region methylation [7]. These findings may need to be interpreted  with caution given a recent report that measurement error and limited variability in DNA methylation measures from mononuclear cells is problematic for a substantial proportion of of CpG sites on the HumanMethylation450 Bead-Chip [26].
Another potential limitation of our study is that we studied %5-mdC in a composite of DNA from different types of white blood cells. As previously noted by others [29,30], the WBC distribution can vary across individuals and the level of global DNA methylation can vary by cell type. The study by Choi and colleagues also used composite DNA [11] as have nearly all other prior WBC methylation studies because it is simple and yields the most DNA [30]. Another potential limitation of our study is the batch-to-batch variation in %5mdC that we observed. This issue necessitated creating quartile cut points for %5mdC separately for individuals within each laboratory batch, a method also employed in the casecontrol study by Choi and colleagues [11]. Missing values for cases and controls, respectively, were as follows: body mass index (1, 1), age at menarche (1, 0), number of livebirths (0, 2), age at first birth (0, 3), personal history of benign breast disease (11,6), alcohol intake (35,31), family history of breast cancer (8,8), and hormone use (2, 3). Q quartile Additionally adjusted for race, body mass index, age at menarche, age at first live birth, number of children, type of menopause, age at natural menopause, personal history of benign breast disease, cigarette smoking, alcohol intake, family history of breast cancer, and hormone use Same controls are used for each category of elapsed time from blood collection to diagnosis, within period of DNA extraction (previously extracted, newly extracted); OR Odds Ratio