Exploring DNA methylation changes in promoter, intragenic, and intergenic regions as early and late events in breast cancer formation

Breast cancer formation is associated with frequent changes in DNA methylation but the extent of very early alterations in DNA methylation and the biological significance of cancer-associated epigenetic changes need further elucidation. Pyrosequencing was done on bisulfite-treated DNA from formalin-fixed, paraffin-embedded sections containing invasive tumor and paired samples of histologically normal tissue adjacent to the cancers as well as control reduction mammoplasty samples from unaffected women. The DNA regions studied were promoters (BRCA1, CD44, ESR1, GSTM2, GSTP1, MAGEA1, MSI1, NFE2L3, RASSF1A, RUNX3, SIX3 and TFF1), far-upstream regions (EN1, PAX3, PITX2, and SGK1), introns (APC, EGFR, LHX2, RFX1 and SOX9) and the LINE-1 and satellite 2 DNA repeats. These choices were based upon previous literature or publicly available DNA methylome profiles. The percent methylation was averaged across neighboring CpG sites. Most of the assayed gene regions displayed hypermethylation in cancer vs. adjacent tissue but the TFF1 and MAGEA1 regions were significantly hypomethylated (p ≤0.001). Importantly, six of the 16 regions examined in a large collection of patients (105 – 129) and in 15-18 reduction mammoplasty samples were already aberrantly methylated in adjacent, histologically normal tissue vs. non-cancerous mammoplasty samples (p ≤0.01). In addition, examination of transcriptome and DNA methylation databases indicated that methylation at three non-promoter regions (far-upstream EN1 and PITX2 and intronic LHX2) was associated with higher gene expression, unlike the inverse associations between cancer DNA hypermethylation and cancer-altered gene expression usually reported. These three non-promoter regions also exhibited normal tissue-specific hypermethylation positively associated with differentiation-related gene expression (in muscle progenitor cells vs. many other types of normal cells). The importance of considering the exact DNA region analyzed and the gene structure was further illustrated by bioinformatic analysis of an alternative promoter/intron gene region for APC. We confirmed the frequent DNA methylation changes in invasive breast cancer at a variety of genome locations and found evidence for an extensive field effect in breast cancer. In addition, we illustrate the power of combining publicly available whole-genome databases with a candidate gene approach to study cancer epigenetics.


Background
Aberrant DNA methylation is a hallmark of cancer [1] and may function in various ways to influence transcription, as is the case in normal differentiation [2]. Comparisons of DNA methylation in cancers to methylation in an analogous normal tissue or to methylation in a variety of normal tissues revealed that cancer is very often associated with a global reduction in DNA methylation [3][4][5]. Hypermethylation of promoter regions overlapping CpG islands (CpG-rich DNA sequences), most notably in some tumor suppressor genes, is also a nearly universal feature of human cancer [6][7][8][9].
Because the terms 'hypermethylation' and 'hypomethylation' indicate changes relative to some appropriate standard [10], the choice of normal tissue for comparison is critical. In cancer patients, otherwise normal-appearing tissue that is adjacent to the tumor is often used as the normal control. However, such tissue can contain early changes in DNA methylation that may contribute to tumor initiation or may just be markers of the onset of neoplasia [11,12]. In the present study, we address the question of the prevalence of early DNA methylation changes and field effects (genetic or epigenetic abnormalities in tissues that appear histologically normal) in breast cancer development using paired adjacent normal and invasive tissue from a total of 129 patients with breast cancer together with 18 reduction mammoplasty controls from cancer-free women. The DNA regions examined for differential methylation included promoters, far-upstream regions, and introns as well as DNA repeats. The geneassociated regions included tumor suppressor genes, stem cell-associated genes and transcription factor genes. The regions for analysis were chosen using findings from the literature and bioinformatics, especially epigenetic data from the Encyclopedia of DNA Elements (ENCODE) at the UCSC Genome Browser [13]. We also used bioinformatics to compare our DNA methylation results with those in The Cancer Genome Atlas (TCGA) [14], one of the most comprehensive public databases on DNA methylation changes in breast cancer. To elucidate the biological significance of our findings, we examined whole-genome expression data for breast cancers from TCGA as well as DNA epigenetic, chromatin epigenetic and transcriptome profiles from cell cultures represented at the UCSC Genome Browser [13,15]. Our results provide evidence for frequent field effects in breast cancer development and illustrate the power of combining whole-genome epigenome and transcriptome profiles with examination of individual gene regions.

Source of samples
Breast cancer patients (N = 129) came from the Breast Cancer Care in Chicago (BCCC) study and were diagnosed at one of many Chicago area hospitals. The study was approved by the University of Illinois at Chicago institutional review board. Women were between the ages of 30 and 79, self-identified as non-Hispanic White, non-Hispanic Black or Hispanic, resided in Chicago, had a first primary in situ or invasive breast cancer diagnosed between 2005 and 2008 and gave written consent to participate in the study and to allow the research staff to obtain samples of their breast tumors from diagnosing hospitals. In addition, 18 unaffected, cancer-free patients who underwent a reduction mammoplasty between 2005-2008 served as non-cancerous controls. The 18 control tissues were made available through a standardized protocol involving an honest broker within the UIC department of pathology. For all patients, hematoxylin and eosin (H&E) stained slides from formalin-fixed, paraffin-embedded (FFPE) tumor blocks were examined to determine representative areas of invasive tumor, histologically and morphologically normal-appearing breast tissue adjacent to the tumor, or confirmed histologically normal tissue obtained from reduction mammoplasty samples (referred to as control or 'non-cancerous' samples). For lumpectomies, adjacent breast tissue was usually chosen from the same block as the tumor. However, when available, a separate block containing breast tissue and no tumor was used as the non-malignant, adjacent sample. Tissue core samples were precisely cut from the selected area using a semiautomated tissue arrayer (Beecher Instruments, Inc.). Because the tissue was fixed and sealed by paraffin, cells from the invasive tissue could not become dislodged and contaminate the adjacent tissue or vice versa.

DNA methylation analysis
Dissolution of paraffin was accomplished by the addition of 1 mL of clearing agent (Histochoice) and incubation at 65°C for 30 min. Samples were digested by the addition of 100 μL of digestion buffer consisting of 10 μL 10X Target Retrieval Solution high pH (DAKO, Glostrup, Denmark), 75 μL of ATL Buffer (Qiagen), and 15 μL of proteinase K (Qiagen) and incubation at 65°C overnight. They were then vortexed and checked for complete digestion. The sample volume was brought up to~100 μL, and 20 μL of each sample was treated with bisulfite and purified using the Zymo EZ-96 DNA Methylation-Direct™ Kit, with a 15-min denaturation step at 98°C followed by a 3.5-h conversion at 64°C, an additional 15-min denaturation at 98°C and a 60-min incubation at 64°C. DNA was eluted in 40 μL of elution buffer. Then, PCR was performed with 0.2 μM of each primer, one of which was biotinylated, and the final PCR product was purified (Streptavidin Sepharose HP, Amersham Biosciences, Uppsala, Sweden), washed, alkaline-denatured, and rewashed (Pyrosequencing Vacuum Prep Tool, Qiagen). Then, pyrosequencing primer (0.5 μM) was annealed to the purified single-stranded PCR product, and 10 μL of the PCR products were sequenced by Pyrosequencing PSQ96 HS System (Biotage AB) following the manufacturer's instructions. The amplicon regions used are given in Table 1. The methylation status of each locus was analyzed individually as a T/C SNP using Pyromark Q96 software (Qiagen, Germantown, Maryland).

Quality control of DNA methylation analysis
All primer-pairs passed tests for sensitivity, reproducibility, and lack of amplification bias (EpigenDx, Hopkinton, MA). All reactions had negligible levels of persisting non-CpG cytosine residues. For each set of PCR primers, a dilution series of technical triplicates was examined with ≤15 ng bisulfite-treated DNA. Primer-pairs were discarded if the signal for a single nucleotide peak was below 50 relative light units (RLU's). The signal to noise (S/N) ratio was calculated by dividing the RLU signal from a single nucleotide incorporation by the RLU value from a negative control nucleotide incorporation, and primer-pairs were discarded if the S/N ratio was less than 10. The reproducibility of percent methylation was also assessed and primer-pairs were excluded if the coefficient of variation exceeded 5 %. The lack of amplification bias was demonstrated for each TSG, Tumor suppressor gene f Although the sequences were in regions that did not make the criteria to be classified as CGI [13], the regions were rich in CpG compared to the average for human DNA g There is a little expressed, primate specific gene, CCDC140, between PAX3 and the test region whose 5' end overlaps the 5' end of PAX3 h There are distant alternative 5' ends of these genes utilized primer-pair by mixing different relative amounts of human placental DNA (Bioline, Taunton, MA) that had been methylated (with SssI-methyltransferase) and amplified DNA left unmethylated (HGHM5 and HGUM5, Epi-genDx). The empirically determined methylation values were compared with the known values. An R-square value of >0.9 was required for validation.

Statistical analysis Breast Cancer Care in Chicago pyrosequencing study
We conducted pyrosequencing methylation assays on 276 FFPE samples including 258 samples of paired invasive and adjacent tissue from 129 patients with invasive breast cancer, as well as 18 reduction mammoplasty noncancerous controls. Methylation values were averaged across multiple neighboring CpG sites to create a single value for percent methylation for each assay. Mean and 95 % confidence intervals for percent methylation were estimated for each gene separately for control mammoplasty, adjacent and cancer samples. Differences in means between unpaired control mammoplasty vs. adjacent and cancer tissues were evaluated via p-values from independent Wilcoxin rank-sum tests, whereas differences in means between paired adjacent and cancer tissues were evaluated via p-values from dependent Wilcoxon signedrank tests. Differences in means between adjacent and cancer tissues were also estimated in linear regression with generalized estimating equations to account for the paired nature of the samples, and 95 % confidence intervals were estimated via 1000 bootstrap replications with bias correction. These models were adjusted for patient age, race/ethnicity and tumor characteristics (stage at diagnosis, tumor grade and either adjusted for or stratified by ER/PR status). For differential methylation in cancer vs. adjacent tissue at DNA regions in the complete sample set, we used a significance level of p ≤ 0.001. For those DNA regions not pursued beyond the pilot phase, which were examined in only 37 pairs of cancer and adjacent tissue, we used a significance level of p ≤ 0.01.

The Cancer Genome Atlas (TCGA) bioinformatics study
We examined methylation results for 192 samples of paired breast cancers and normal tissue (N = 96), based on TCGA profiles [14] from the Infinium HumanMethy-lation450 array performed on frozen (not formalin fixed) samples. Differences in mean methylation between paired normal and invasive tissues were evaluated using p-values from dependent Wilcoxon signed-rank tests.
Additionally, to examine the correlation between regional methylation and gene expression values, invasive breast cancer tumors with both methylation results and gene expression results (N = 800) were obtained from TCGA bioportal [16,17]. Methylation value data were aquired using the Infinium HumanMethylation450 assay and gene expression data were taken as z-scores using Illumina HighSeq 2000 Total RNA Sequencing Version 2. Spearman correlation coefficients were calculated to measure the association between regional loci methylation level and gene expression level. The level for significance for both of the previously identified analyses was defined as p ≤ 0.01. Lastly, other wholegenome databases that are part of the ENCODE project [18,19] and publicly available profiles for all mappable CpGs in control and cancer-derived breast epithelial cell cultures using next-generation sequencing of bisulfitetreated DNA (bisulfite-seq) [15] were examined for DNA methylation, transcription, or histone modification as described in Results.

Choice of regions for analysis
We chose a diverse set of genes and two DNA repeats (Table 1) to assay for DNA methylation in cancer, adjacent and control mammoplasty tissues. Eight of the 23 examined DNA regions overlapped or were near regions previously reported to be hypermethylated in breast cancer vs. non-cancerous breast tissue, namely, EGFR [20], GSTP1 [21], LHX2 [22], PITX2 [23], RASSF1A [24], RUNX3 [25], APC [26] and BRCA1 [27,28] or hypomethylated in breast cancer vs. normal breast, namely, TFF1 [29], satellite 2 and LINE-1, DNA repeats [30,31]. In addition, the first six of the above-mentioned gene regions displayed hypermethylation in one or two breast cancer cell lines (MCF-7 and T-47D) relative to a human breast epithelial cell culture derived from normal breast tissue (human mammary epithelial cells, HMEC) and compared with most normal tissues, including breast tissue as seen in whole-genome DNA methylation data (reduced representation bisulfite sequencing, RRBS) from the ENCODE project [5,13,19]. An additional seven gene regions (EN1, PAX3, SIX3, SOX9, RFX1, SGK1 and NFE2L3) were chosen mostly on the basis of hypermethylation profiled by RRBS in breast cancer cells lines (and often other cancer cell lines) vs. the abovementioned normal cell cultures or tissues [13]. The first five of these genes also had been previously reported to display hypermethylation in non-breast neoplasms vs. control tissue [32][33][34][35]. Figure 1 illustrates ENCODE data at the UCSC Genome Browser [13] for the studied region far upstream of EN1, one of the gene regions chosen for examination in this study on the basis of RRBS DNA methylation data for breast cancer cell lines vs. control cells and tissues. EN1 encodes a homeobox-containing transcription factor that is implicated in the development of the nervous system and serves as a marker of certain neurons [36]. Underneath the diagrammed gene structure (Panel a) are the aligned CpG islands in the illustrated region (Panel b). The tracks in Panel c show the DNA methylation status quantified at the RRBS-detected CpGs in a variety of cell cultures and normal tissues using an 11color, semi-continuous scale (see color key) to indicate the average DNA methylation levels at each monitored CpG site (ENCODE/RRBS/HudsonAlpha Institute, [13]). The MCF-7 breast cancer cell line and several diverse cancer cell lines were hypermethylated throughout most of the gene and its upstream region relative to HMEC, normal breast tissue, other normal tissues and the majority of non-cancer cell cultures (Panel c and data not shown from ENCODE [13]). The exceptions were normal muscle cell cultures (myoblasts and myotubes) but these were methylated in a smaller region that did not overlap the beginning of the gene as did the hypermethylation in MCF-7 cells. T-47D, the second examined breast cancer cell line in this RRBS database, was hypermethylated relative to HMEC but to a lesser extent than for MCF-7 cells.
We also examined two gene regions (ESR1 and GSTM2) found to display hypermethylation preferentially in more aggressive breast cancers [37,38]. In addition, we studied CD44 and MSI1, which have been reported to have promoter hypomethylation in triplenegative breast cancers, that is, cancers that lack estrogen receptors (ER), progesterone receptors (PR), and human epidermal growth factor-2 receptors (HER2) [39]. The last gene region we examined was MAGEA1, which encodes a cancer-testis antigen that is not expressed in normal somatic tissues but is sometimes expressed in breast cancer [40]. Cancer-testis antigen genes are often hypomethylated in various kinds of a b c d Fig. 1 Example of how some gene regions were chosen for examination in this study on the basis of available RRBS DNA methylation profiles for breast cancer cell lines and normal cell cultures and tissues visualized in the UCSC Genome Browser [13]. a The EN1 gene structure with exons as heavy horizontal bars; b, the aligned CpG islands in the illustrated region.; c, DNA methylation (ENCODE/RRBS/HudsonAlpha) profiles for the indicated cell cultures and normal tissues using an 11-color, semi-continuous scale (see color key) to indicate the average DNA methylation levels at each monitored CpG site; d, aligned transcription results indicating that the non-transformed breast cancer cell line is not transcribing this gene irrespective of its lack of DNA methylation. Paradoxically, normal myoblasts are transcribing it despite some upstream DNA methylation. All data are from ENCODE [19] cancer [41], although the methylation status of MAGEA1 in breast cancer was not known.

Samples and method used for DNA methylation analysis
The breast tissue samples analyzed for DNA methylation were invasive cancer (referred to as "cancer"), histologically normal tissue adjacent to the cancer (referred to as "adjacent tissue") and non-cancerous reduction mammoplasty samples (referred to as "control mammoplasty"). Characteristics of the 129 breast cancer patients and their tumors are listed in Table 2. The carcinomas were equally likely to be stage I vs. later stages, equally distributed across histological grades, and one third of them lacked both estrogen and progesterone receptors. Before studying the full sample set, we conducted a pilot study on the 23 test regions using paired samples of cancer and adjacent tissue from 37 patients, and on samples from 18 reduction mammoplasty patients. Of the 23 test regions, 16 were analyzed in an additional set of 92 patients with paired cancer and adjacent tissue samples to give a total of 276 samples.
Methylation analysis was performed by pyrosequencing of bisulfite-treated DNA. This method allowed us to monitor individual reactions for incomplete bisulfite modification and to check for PCR-bias [42,43]. We used FFPE-derived DNA, which is partly degraded and difficult to analyze because of crosslinking resulting from the formalin fixation process [44], and which may be available in only small amounts. These problems are compounded by further degradation associated with bisulfite treatment for the methylation analysis. Bisulfitebased pyrosequencing overcomes these problems and provides accurate quantification [43].
Variation in DNA methylation among samples of the same tissue type As expected for cancer-linked DNA methylation changes [7], there was large variability in the average 5methylcytosine (5mC) content at a given test region among individual cancer samples, as seen in the high standard deviation (SD) relative to the mean methylation values ( Table 3). The between-sample variability contrasted with the much lower within-sample variability of technical duplicates (data not shown), observed in the pilot study. Moreover, the control mammoplasty samples generally showed less variability in average 5mC content compared with adjacent or cancer samples (Table 3).
DNA hypermethylation in cancer vs. adjacent and control mammoplasty samples Figure 2 (Panel a) displays the mean percent methylation and 95 % confidence limits for each of the 23 studied DNA regions and shows the results separately for control mammoplasty, adjacent and cancer samples. Hypermethylation in cancer vs. adjacent samples was seen at a significance level of p ≤ 0.001 for 12 of the 16 test regions in the large-scale study and at a significance level of p ≤ 0.01 for three of the seven regions not pursued beyond the pilot phase ( Table 3). Twelve of the regions were also significantly hypermethylated in cancer vs. control mammoplasty samples (p ≤ 0.01) ( Table 3). The difference in the average percent methylation for significiantly hypermethylated sequences in cancer vs. adjacent tissue or for cancer vs. control mammoplasty tissue was largest for RASSF1A (23.6 and 30.5, respectively). Cancer-associated hypermethylation was seen in test sequences that were in extended promoter regions (regions immediately upstream or downstream of the transcription start site, TSS), in sequences upstream of promoter regions and in introns. A mostly similar pattern of cancer hypermethylation of these gene regions was observed in TCGA for breast cancer and paired normal samples (Fig. 2, panel b).
Eight of the ten test regions overlapping DNA sequences previously reported to be hypermethylated in breast cancer vs. nonmalignant breast tissue or in more aggressive vs. less aggressive cancer types (APC , EGFR, GSTM2, GSTP1, LHX2, PITX2, RASSF1A and RUNX3) exhibited hypermethylation in this study at the designated p-value cutoff levels (p < 0.001 and p < 0.01,  SIX3 and SOX9), significant hypermethylation was seen in the cancer tissue compared with adjacent tissue with the exceptions of SOX9 (p = 0.002) and NFE2L3 (Table 3). Results were not substantively different after adjusting for patient and tumor characteristics (age, race/ethnicity, ER/PR status, stage and grade) ( Table 4). When stratifying estimates by ER/PR status, several genes appeared to display differential changes in methylation for adjacent vs. cancer tissues (Table 4). GSTM2 exhibited more hypermethylation for ER/PR negative tumors (p < 0.05), whereas EGFR displayed greater hypermethylation for ER/PR positive tumors (p < 0.05). TFF1 and MAGEA1 SD, standard deviation f These seven assays were not pursued beyond the pilot phase and, therefore, had 32-37 paired cancer and adjacent samples instead of 105-129 Differences were determined to be statistically significant at p < 0.001 for the complete sample set and p < 0.01 for regions only examined in the pilot study displayed greater hypomethylation for ER/PR positive tumors. NFE2L3 displayed hypermethylation for ER positive tumors and hypomethylation for ER negative tumors (p < 0.05) ( Table 4).

DNA hypomethylation in cancer vs. adjacent and control mammoplasty samples
We found that the promoter regions of TFF1 and MAGEA1 were hypomethylated in cancer compared  Table 1. a DNA methylation analysis of samples from the Breast Cancer Care in Chicago study (2005)(2006)(2007)(2008) as determined by our bisulfite pyrosequencing. Control samples (reduction mammoplasty) from unaffected women are represented by green bars, cancer-adjacent, histologically normal samples by blue bars and cancer samples by red bars. b Bioinformatic analysis of DNA methylation of breast cancer samples and paired non-cancerous adjacent samples from The Cancer Genome Atlas (TCGA). Paired non-cancerous adjacent samples are represented by blue bars and cancer samples by red bars. In both panels, promoter sequences are displayed first, followed by upstream sequences, then introns and lastly, DNA repeats with adjacent samples (p < 10 -5 ) and in cancer vs. control mammoplasty samples (p < 10 -4 ; Tables 3 and 4). MAGEA1 had high mean methylation levels in the control mammoplasty samples and adjacent samples (>80 % for both) but much lower DNA methylation levels in the cancer samples. TFF1 also had high mean methylation levels in the control mammoplasty tissue (82 %), although methylation levels were lower in adjacent tissue (72 %), and lowest in cancer tissue (49 %). Cancerassociated hypomethylation of TFF1 and MAGEA1 was also observed by Illumina HumanMethylation450 analysis of DNA methylation in the TCGA database for breast cancer and paired normal samples (Fig. 2b, Panel b and Table 5). In addition, pyrosequencing revealed that the two studied DNA repeats, the tandem, juxtacentromeric satellite 2 (Sat2) and interspersed repeat LINE-1, displayed significant hypomethylation in cancer vs. adjacent samples (Table 3). However, the extent of hypomethylation for these highly repeated sequences was much less (5.4 and 1.6 %, respectively), which is not surprising given the very high copy number for these repeats. These seven assays were not pursued beyond the pilot phase and therefore have considerably fewer cancer and adjacent tissue samples analyzed All estimates of mean percent methylation are adjusted for age, race/ethnicity, stage at diagnosis, tumor grade, and either adjusted for or stratified by ER/PR status Cancer-associated aberrant methylation in adjacent tissue vs. control mammoplasty samples A comparison that could be made with our pyrosequencing data, that is not available in the TCGA database for breast samples, is an analysis of cancer-adjacent tissue vs. breast tissue from cancer-free individuals. Comparing methylation levels of the adjacent samples in breast cancer patients and the control mammoplasty samples revealed that RASSF1A had the largest difference in mean methylation (Table 3). Only five other sequences displayed hypermethylation or hypomethylation in adjacent vs. control mammoplasty samples at the significance level of p < 0.01 (SGK1, LINE-1, EGFR, Sat2 and TFF1; Table 3) and only the first two of these at p ≤ 0.001. Surprisingly, the most statistically significant difference between methylation in adjacent tissue relative Pyrosequencing (Pyroseq) assay coordinates are given in Table 1 b Non-cancer tissue adjacent to paired breast cancer from 96 patients in the TCGA Illumina Methylation450 database for genome-wide DNA methylation [14] c Samples from the invasive component of the breast cancer from patients in the TCGA Methylation450 and expression (expr.; RNA-seq) databases [16,17] d Difference in mean % methylation for cancer minus that for the paired adjacent tissue for the 96 patients with both in the TCGA database e From a dependent sample Wilcoxon Sign Rank test; P-values > 0.10 are suppressed f Spearman correlation coefficient for methylation levels vs expression levels among all the invasive breast cancer samples in the TCGA database g The four underlined genes were the only ones that had a positive association of cancer methylation in non-promoter regions with expression levels of the associated gene to control mammoplasty tissue was hypermethylation of LINE-1 (p < 10 -9 ) as contrasted with the hypomethylation of this repeat in cancer vs. adjacent samples (p = 10 -4 ). However, the magnitude of the expected [45] hypomethylation of LINE-1 repeats in cancer vs. adjacent tissue was small (-1.6 percentage points) and the magnitude of observed hypermethylation was modest (+4.1 percentage points). Mammoplasty control samples came from women who were younger (mean of 33 y, range 16-68) than the patients from whom the breast cancer samples originated (mean of 56 y, range 25-77), as would be expected given the availability of such samples. In addition, there are some differences in the cellular composition of breast tissue dependent upon whether it was derived from obese women, the likely source of most mammoplasty samples [46]. Therefore, the small differences in methylation, as seen for LINE-1, need to be interpreted with caution.

Correlations between cancer-associated changes in DNA methylation and gene expression
An analysis of the Illumina HumanMethylation450 DNA methylation database for invasive breast cancers and the RNA-seq expression database for the same cancers in the TCGA collection [14] demonstrated that the methylation status of most of the studied regions was significantly associated with altered expression of the corresponding gene (Table 5). For this analysis, we focused on either the same small region studied by pyrosequencing in this study or that region extended by 100 bp on either side (Table 5). All the promoter regions for which we demonstrated cancer-linked hypermethylation by pyrosequencing (BRCA1, CD44, GSTM2, GSTP1, MSI1, NFE2L3, RASSF1, RUNX3 and SIX3) exhibited an inverse correlation with expression among the cancers. Therefore, as expected [47], more promoter methylation was associated with lower expression levels. The two promoter regions displaying cancer hypomethylation (TFF1 and MAGEA1) also displayed an inverse correlation between methylation among cancers and expression indicating that cancerlinked losses in promoter methylation were associated with increased (and abnormal) expression. Importantly, the only regions that displayed a positive correlation between methylation and expression among the breast cancers in the TCGA database were four far-upstream or intragenic regions for the genes EN1, PITX2, APC and LHX2.

Insights into DNA hypermethylation positively associated with gene expression from the ENCODE database
We compared DNA methylation from ENCODE RRBS profiles of normal breast epithelial cells (HMEC) and several breast cancer cell lines, MCF-7 and T-47D (ENCODE/RRBS/HudsonAlpha Institute; [18,19]). In addition, profiling of all mappable CpG sites in HMEC and the breast cancer cell line HCC1954 was available [15,18]. As expected, differences in DNA methylation between promoter regions that we examined by pyrosequencing mostly mimicked the hypermethylation or hypomethylation observed in cancer vs. adjacent tissue or control mammoplasty tissue analyses by pyrosequencing (Additional file 1: Table S1).
Next we used ENCODE data to analyze transcriptome profiles available for HMEC, many other normal cell cultures and MCF-7 in the ENCODE database (ENCODE/ RNA-seq/Cold Spring Harbor Lab) to elucidate the positive association shown in Table 5 between cancer DNA hypermethylation and gene expression for the pyrosequenced regions in EN1, LHX2, PITX2 and APC. With respect to EN1, methylation of its far-upstream region was positively associated with expression in a comparison of normal cell cultures. Normal myoblast and myotube cultures, which strongly and preferentially express EN1 were significantly hypermethylated in this farupstream region when compared with other studied cell cultures and tissues [48] including HMEC and MCF-7 cells (Fig. 1c and d). Unlike myoblasts and myotubes, the MCF-7 breast cancer cell line was hypermethylated not only in this region but throughout the body of the EN1 gene, which may explain why MCF-7 cells did not express EN1 while myoblasts and myotubes did. Similar to the studied EN1 far-upstream region, intron 3 of LHX2 and intron 1 of PITX2, exhibited muscle lineage hypermethylation directly associated with highly specific expression in myoblasts (data not shown, [18]). In contrast, APC is broadly expressed among diverse cell types.

Histone modifications and gene expression from ENCODE
We also examined the pyrosequenced regions in EN-CODE histone modification profiles, which were available for HMEC but not for MCF-7 (ENCODE/Histone Modifications by ChIP-seq/Broad Institute). As expected, promoter hypermethylation in cancer was usually in regions displaying active promoter-type histone modifications in HMEC (Additional file 1: Table S1). These histone modification profiles distinguish between chromatin regions that are predicted to be active promoters (histone H3 lysine-4 trimethylation, H3K4me3, and H3K27 acetylation, H3K27ac), silenced regions (H3K27me3), active enhancers (H3K4me1 and H3K27ac), and poised promoters or enhancers (H3K4 methylation sometimes with H3K27me3 but without H3K27ac) [49]. The histone methylation profiles (Additional file 1: Table S2) also indicate that two of the studied regions far downstream of EGFR (1.35 kb downstream of the TSS in intron 1) and SOX9 (2 kb downstream of TSS, in intron 2) have the chromatin modifications typical of active promoters in HMEC cultures, in which these genes are expressed.
Histone modification profiles for HMEC cultures were also very informative for the four intragenic or farupstream regions that displayed breast cancer-associated DNA hypermethylation as well as a positive association between DNA methylation and expression among TCGA breast cancers (Additional file 1: Table S2). In HMEC cultures, the examined EN1, PITX2 and LHX2 chromatin regions all exhibited enrichment in H3K27me3. This histone mark is often, but not always, associated with repression and frequently found in DNA regions in normal cells (especially stem cells) that become hypermethylated during carcinogenesis [50]. The pyrosequenced region in APC, unlike the above three gene regions, exhibited the histone marks of an active promoter (H3K4me3 and H3K27ac) in HMEC. However, although this region is 30 kb downstream of the APC TSS defined by the isoform NM_001127511, it overlaps an alternative promoter associated with isoform NM_000038. Both isoforms encode the APC protein, although their promoters are separated by 30 kb, and both are functionally important [51]. HMEC cultures express both isoforms abundantly, as indicated by histone modification and RNA profiling in ENCODE databases (Additional file 1: Table S2). However, TCGA methylation profiles showed that only the downstream alternative promoter region becomes hypermethylated in breast cancers. The average percent methylation at the upstream and downstream promoters in invasive breast cancer in the TCGA database were 10 and 28, respectively, while those for paired normal tissue were 11 and 6.

Discussion
Using a candidate gene approach on a large, ethnically diverse set of subjects, we compared not only invasive breast cancer and adjacent histologically normal tissue (as in the TCGA Illumina HumanMethylation450 database [14]), but also control samples of reductive mammoplasty tissue from non-cancer patients using a quantitative, gold-standard method for DNA methylation analysis (bisulfite/pyrosequencing) amenable to archival FFPE samples. Our pyrosequencing analysis of DNA methylation involved promoter DNA regions, regions far upstream of genes, intragenic regions and high-copy interspersed or tandem DNA repeats. In addition, DNA methylation, transcriptome and histone modification profiles from TCGA or ENCODE wholegenome databases were used to enhance the analysis. A limitation of our study of aberrant DNA methylation in breast cancer is that clinical samples such as ours include cell types other than breast epithelial cells. Therefore, the methylation levels estimated in our study represent an average across many cell types. Nonetheless, the similarities between hyper-or hypomethylation determined in our bioinformatic comparisons of DNA methylation in cancer-derived and normal mammary epithelial cell cultures (Additional file 1: Tables S1 and S2) and aberrant DNA methylation from our pyrosequencing study of maligant and non-cancerous breast tissues (Table 3) argue for our analysis indicating DNA changes, at least in part, in the epithelial cell populations in cancers vs. non-cancerous breast samples.
Besides confirming that a wide variety of DNA sequences display hyper-or hypomethylation in a large, diverse collection of invasive breast cancers vs. adjacent tissue, we demonstrated significant hyper-or hypomethylation in six of the 16 DNA regions examined in both 15 -18 reduction mammoplasty samples and more than 100 histologically normal tissue samples adjacent to the breast cancers. These six DNA sequences were in promoter regions (RASSF1 and TFF1), an intron (EGFR), a far-upstream (SGK1) gene regions or DNA repeats (LINE-1 and Sat2). If control mammoplasty samples are mimicking the epigenetics of normal breast tissue, then our results suggest a field effect that could include changes which predispose to carcinogenesis [12]. The adjacent tissue samples used in our comparison had been carefully evaluated morphologically and histologically for no evidence of malignancy. In addition, the lack of evidence for a field effect for most of the studied DNA regions, including for regions with frequent hypermethylation in the cancer tissue (e.g., RFX1 and EN1), is consistent with a field effect rather than contamination of adjacent samples with tumor tissue.
Field effects for DNA methylation changes in RASSF1, EGFR and TFF1 might be important in influencing preneoplastic changes in gene expression relevant to tumor development. RASSF1 is a tumor suppressor gene that regulates apoptotic and cell cycle checkpoints [52]. RASSF1 hypermethylation has been detected in carcinoma in situ and invasive breast cancer and is inversely correlated with RNA and protein expression levels [24,53,54] and overall survival [55]. Like overexpression of HER-2 protein, overexpression of EGFR protein, another member of the epidermal growth factor/tyrosine kinase family, is related to multiple drug resistance and decreased patient survival [56,57]. We demonstrated significant hypermethylation of EGFR at part of the extended promoter-like chromatin region (see below) in intron 1 by comparing cancers with adjacent tissue. Hypermethylation in this region was also seen in the comparison of histologically normal, cancer-adjacent tissue and control mammoplasty tissue. Given the protooncogene status assigned to EGFR, it is not yet clear what role hypermethylation of EGFR might play in breast cancer progression.
Expression of TFF1, a gene encoding a small secretory peptide implicated in preserving mucosa in the intestinal track, is associated with promoter hypomethylation in cultured cells and in breast cancers [29,58]. We found hypomethylation of the promoter of this estrogeninducible gene in both cancer vs. adjacent tissue and in adjacent vs. control mammoplasty tissue. Expression of TFF1 in breast cancer may be associated with a poor outcome based upon breast cancer cell lines and a mouse model [59]. However, a recent study of breast cancer patients found that TFF1 expression was greater for ER/PR positive breast cancers, which generally have a better prognosis than ER/PR negative breast cancers [60]. Similarly, we found that ER/PR positive tumors showed greater hypomethylation compared with ER/PR negative tumors.
One surprising result from our analysis was that the extended promoter region of BRCA1 did not show significant hypermethylation in breast cancer relative to adjacent tissue or control mammoplasty tissue despite the fact that we chose a promoter region for analysis similar to or overlapping those employed in other studies, many of which did find BRCA1 hypermethylation in breast cancer [28,31,[61][62][63][64]. The first three of these studies used end-point methylation-specific PCR, which is extremely sensitive for detection of any DNA methylation but is not quantitative, and these studies reported only the percentages of samples that were called as methylated. We found very low average levels of methylation in the BRCA1 promoter region in all samples (1.4, 1.6 and 3 % for control mammoplasty, adjacent and cancer samples, respectively) including a few outliers with considerable methylation. In a study using MALDI-TOF mass array analysis of 48 FFPE samples, only five of the 17 tested CpG sites displayed hypermethylation in breast cancer vs. matching control tissue, and the extent of hypermethylation at these five sites was surprising high (averages of about 90 % for cancers vs. about 10 % for controls) [63]. However, as in our study, MALDI-TOF [65] and methylation-specific multiplex ligation assays [64] by two other groups, each using flash-frozen breast cancers and matching control tissue, revealed only low percentages of cancers with greater BRCA1 methylation compared with controls. Moreover, in their studies and ours there was much more frequent cancer hypermethylation at many other tested promoter regions. Similarly, bioinformatic analysis of methylation levels in the HumanMethylation450 TCGA database revealed insignificant differences between breast cancer and adjacent tissue for the BRCA1 promoter region that we examined (Table 5).
Our results from pyrosequencing that intronic sequences far downstream of the canonical promoter region (EGFR, LHX2, SOX9 and RFX1) and intergenic sequences upstream of the promoter (PAX3 and EN1) were significantly hypermethylated in breast cancer may be related to new understandings of the transcriptionregulatory roles played by DNA methylation in intragenic and distant intergenic regions [2,66]. For example, sequences considerably downstream of the TSS may be part of the functional promoter or of transcriptionelongation regulatory elements such that local methylation could alter gene expression levels. This may be the case for the pyrosequenced regions of EGFR (1.35 kb downstream of the TSS in intron 1) and SOX9 (2 kb downstream of TSS, in intron 2) far downstream of the TSS. These two regions displayed hypermethylation in cancer. In normal HMEC, where these two genes are actively transcribed, ENCODE histone modification profiling (Additional file 1: Table S2) indicates that the studied regions overlap large chromatin segments with histone modifications typical of active promoters starting in the canonical promoter region and continuing into the 5' intragenic region [13].
The importance of not restricting analysis of cancerlinked aberrant DNA methylation to standard promoter regions is also apparent from recent studies providing evidence that DNA hypermethylation is implicated in alternative promoter usage; regulating splicing of RNA; and, in certain intragenic regions, in upregulating expression [2,66]. Indeed, examples of the latter were seen in our bioinformatic analyses of databases for DNA methylation and expression in breast cancers (TCGA) and in cultured cells (ENCODE). For example, we found that DNA hypermethylation at APC was positively associated with increased expression among breast cancers (TCGA database). While the examined APC region is 30 kb downstream of the TSS in intron 1 of one APC isoform expressed in HMEC, it is also in the promoter region of another protein-coding isoform of the gene transcribed in HMEC. Therefore, the cancer-associated hypermethylation of this APC intron/promoter region might help regulate levels of alternate promoter usage for this gene.
The other three breast cancer-associated DNA hypermethylated regions which displayed significantly more expression in breast cancers with higher levels of DNA methylation are 5.6 kb upstream of EN1, 7 kb upstream of PITX2 or 4 kb downstream of the TSS of LHX2. These genes code for homeobox-containing transcription factors important in development. The location of these regions and their association with normally repressive H3K27me3 in HMEC cultures (Additional file 1: Table S2) suggest that the positive correlation of DNA methylation at these regions with transcription may be due to their playing a role in controlling the borders of active promoter regions and counteracting the spread of H3K27me3-repressive chromatin into the core promoter [67].

Conclusions
We identified frequent DNA methylation changes in invasive breast cancer at a variety of genome locations and found evidence for an extensive field effect in breast