Genome-wide DNA methylation signatures to predict pathologic complete response from combined neoadjuvant chemotherapy with bevacizumab in breast cancer

Neoadjuvant chemotherapy is given before surgery to patients with primarily unresectable, advanced-stage cancers to render the tumor resectable, and to facilitate breast conservation. Previously, we have reported a prospective phase II trial for women with stage IIA-B/IIIA-B-C breast cancer with improved pathologic complete response (pCR) when using bevacizumab in the neoadjuvant setting. Chemotherapy agents are given intravenously during multiple cycles of systemic treatments. However, the effect of the treatment is only evaluated upon the completion of therapy. Here, data from a clinical trial of 40 breast cancer patients with very aggressive disease and poor prognosis were studied aiming to identify epigenetic signatures in blood-derived DNA at baseline as potential non-invasive markers to predict pCR and to determine if treatment-related changes in epigenetic profiles reflect responsiveness to therapy. We performed genome-wide DNA methylation profiling using blood-derived DNA, and found that pre-treatment methylation status of BRD9 was predictive of responsiveness to therapy. Post-treatment global methylation differences were also observed between responders and non-responders. Most differentially methylated (DM) CpGs were located in promoter CpG-island regions for responders and in the open-sea region for non-responders. In responders, DNMT3B was hypomethylated while most of the other genes were hypermethylated after 4 cycles of treatment. Hypomethylation of DNMT3B could potentially lead to the increased methylation of oncogenes and genes responsible for cell growth and proliferation, facilitating responsiveness to the therapy. These results support the possible development of BRD9 as a biomarker for treatment selection before neoadjuvant therapy with chemotherapy and bevacizumab, and indicate DNMT3B as a potential target to improve clinical response. Further prospective validation of these findings is warranted. Trial registration ClinicalTrials.gov Identifier: NCT00203502.

arm, single-institution phase II study approved by the Institutional Review Board (IRB) of the University of Arkansas for Medical Sciences between September 2005 and January 2008. Enrolled participants consisted of women older than 18 years with histologically confirmed stage II or III breast cancer, who did not have inflammatory BC (T4d) or any recent arterial thromboembolic events, uncontrolled hypertension, recent major surgery, or baseline left ventricular ejection fraction (LVEF) <50%. The participants were offered chemotherapy combination with docetaxel (75 mg/m2), cyclophosphamide (500 mg/m2), and bevacizumab (15 mg/kg) every 3 weeks×4 cycles followed by doxorubicin (60 mg/m2) every 3 weeks×4 cycles before undergoing surgery for their cancer. After surgery, patients were evaluated for pCR as the primary clinical endpoint, defined as the absence of invasive cancer in the breast specimen. Of the 40 patients enrolled, one withdrew consent and one did not undergo surgery after NCT but she was included in the intent to treat analysis. Sixteen (41%) of the 39 evaluable subjects attained pCR, significantly higher than the null-hypothesis rate of 25% (P = 0.0204). We have previously reported that the planned sample size of 40 subjects gave NCT00203502 sufficient power to detect a hypothesized increase of 20 percentage points in pCR [2]. For this report, we undertook retrospective power analyses of post-treatment changes in methylation status detectable at FDR-adjusted p < 0.05 in the responder and non-responder subgroups. The results of that retrospective power analysis appear below, and show that the subgroups had adequate sample sizes for the planned methylation analysis.

Sample collection, storage, and DNA isolation
Blood samples were collected from patients at baseline, after 4 cycles of docetaxel, cyclophosphamide and bevacizumab (DCB), and just before infusion at each of the 8 preoperative chemotherapy cycles. Blood components were separated, aliquotted into cryovials, and stored at -80˚C for later analyses. Genomic DNA was extracted from buffy coats of the blood samples collected using Maxwell (Promega, Fitchburg, WI, USA). Concentration and quality of the extracted genomic DNA (gDNA) was measured using the Nanodrop 8000 instrument (Thermo Scientific, Waltham, MA, USA).

Infinium methylation EPIC beadChip analysis
Following bisulfite treatment of 1 μg genomic DNA using the EZ DNA Methylation kit (Zymo Research, Irvine, CA), the bisulfite-converted DNA was hybridized onto the Infinium Methylation EPIC BeadChip (Illumina, San Diego, CA), following the Illumina Infinium HD Methylation protocol in the Genomics Core Facility at UAMS. The Methylation EPIC BeadChip covers over 850,000 CpG sites, and has increased genome coverage of regulatory regions and higher reproducibility and reliability compared to previous versions [3]. Whole genome amplification, hybridization, staining and scanning steps for all samples were performed, the Illumina iScan SQ scanner was used to create images of the single arrays, and the intensities of the images were extracted using the Methylation module (v.1.9.0) of the GenomeStudio (v.2011.1) software (Illumina). Raw intensity data as IDAT files were imported into GenomeStudio for the computation of detection p-value of the probes, and all further steps including data import, normalization, filtering and analyses were performed using the methylation pipeline in Partek Genomics Suite™ 6.6 (Partek Inc., St. Louis, MO).

Methylation data analysis
The data was normalized using the Sub set-quantile Within Array Normalization (SWAN) [4]. For analysis, probes were excluded if detection p values were > 0.01 (n = 130), or if SNPs were present in the target CpG based on NCBI dbSNP Build 138 (n = 59). Probes on the X and Y chromosomes (n = 19,572), and probes with polymorphic targets & cross-hybridization potential [5] (n = 318,809) were also excluded. The final dataset contained 527,348 probes across all samples. Percent methylation values for each CpG site (β-values) and logit-transformed ratios of methylated to unmethylated probe intensities (M-values) were extracted for further analysis [6]. Since race plays an important role in genome-wide DNA methylation [5], data from one Asian and one Hispanic participant was excluded from the analysis.
For pattern identification in DNA methylation, unsupervised analysis including unsupervised hierarchical clustering [7] and Principal Component Analysis (PCA) were used. T-tests and chi-square (X 2 ) tests were performed to evaluate differences between non-responders and responders. Analysis of variance (ANOVA) adjusting for race, tumor type, and estrogen receptor (ER) status with Fisher's Least Significant Difference contrast method were used to assess the differentially methylated (DM) CpG sites univariately for their potential to predict the subsequent response to therapy. The resulting P-values were adjusted for multiple testing with the false discovery-rate (FDR) procedure of Benjamini and Hochberg [8], and significance was granted with FDR P-value <0.05 to identify significantly differential DNA methylation. In subgroup analysis, we examined post-treatment changes from baseline within each responder group, again using ANOVA adjusting for race, tumor type, and estrogen receptor (ER) status with Fisher's Least Significant Difference contrast method. Classical Receiver Operating Characteristic (ROC) curve analysis was used to evaluate the performance of a single CpG site as a biomarker.
To determine whether the responders and non-responders had adequate sample sizes for the subgroup analysis of post-treatment change in methylation, a retrospective power analysis was conducted using the FDR-control method of Jung et al. [9]. We define a truly positive CpG site to be one that shows a standardized change of �1.5 SD in post-treatment methylation level compared to baseline. To calculate power retrospectively for multiple 2-sample ttests with FDR set equal to 5%, we hypothesis that 1.00% (5,274) of the 527,348 CpG sites in the final data set satisfy the �1.5-SD criterion of being truly positive. Under this hypothesis, a sample size of N = 16 responders provides the multiple 2-sample t-tests at FDR = 5% with 89.77% power per test to detect a truly positive CpG site, and yields a binomial expected value ±1 SD of 4,734 ±22.0 positive test results (true discoveries) from the 5,274 CpG sites hypothesized to be truly positive. Under the same hypothesis, a sample size of N = 21 nonresponders provides the multiple 2-sample t-tests at FDR = 5% with 99.01% power per test to detect a truly positive CpG site, and yields a binomial expected value ±1 SD of 5,222 ±7.2 positive test results (true discoveries) from the 5,274 CpG sites hypothesized to be truly positive. These retrospective power calculations demonstrate that both the responders and the non-responders had sample sizes large enough to power adequately the subgroup analysis of methylation response to treatment.

Pathway analysis
The genes corresponding to the significant loci were projected onto knowledge-based networks for their potential biological implications using Ingenuity Pathway Analysis (IPA 1 , www.qiagen.com/ingenuity). IPA employs information obtained from the literature to assemble and extrapolate known interactions, signaling, as well as the relationships between the molecules. The analysis here was restricted to genes or molecules that have been observed directly in human breast or immune cell lines in the literature. Network analysis was generated de novo based on the mapped CpG sites to explore potential molecular events and mechanisms impacted by the DM CpGs. The P-score [computed as-log 10 (P-value)] reflected the probability of the identified genes in a list of biological functions stored in the IPA knowledge base.

Demographics of the study participants
Samples for CpG methylation analysis were available for 37 patients; their demographics and disease characteristics are shown in Table 1. The two groups differed significantly at p < 0.05 in race (African American, AA; and European American, EA) and estrogen receptor (ER) status. Although differences in the type of tumor did not reach statistical significance (p = 0.07), a tendency was observed for most of the lobular and poorly differentiated breast cancer patients to be non-responders. Thus, race, ER status, and tumor type were included as covariates to adjust for in the subsequent analysis.

DM CpG sites at baseline predictive of pathologic response
First, we aimed to identify germline DM CpGs at baseline that can serve as non-invasive markers to predict their pCR status following NCT. From our 4-way analysis of variance (ANOVA) model controlling for race, ER status, tumor type, and pCR, 25,321 CpG sites were significantly different between responders and non-responders at baseline (unadjusted p < 0.05), but only one CpG passed multiple-testing adjustments (FDR p < 0.05, Fig 1a). This CpG site (cg06362249) binds to three BRD9 transcripts (Chromosome 5, NR_02763, NM_001009877, NM_023924) in the gene body, was hypermethylated among responders at baseline compared to non-responders (Fold change = 1.4; unadjusted p = 7.78e-009, FDR p = 0.004), decreased its methylation level by 1.24 fold after 4 cycles of treatment among responders (unadjusted p = 0.02), but did not change appreciably after treatment among non-responders (Fold change = 1.0; unadjusted p = 0.69). The area under the curve (AUC, Fig 1b) for cg06362249 is 0.842 (95%CI: 0.708-0.976) in the Receiver Operating Characteristic (ROC) analysis with the optimal cutoff of 0.889 in β value (% methylation, see Fig 1b and 1c) between responders and non-responders. The methylation levels of cg06362249 were not distinguishable between responders and non-responders after 4 cycles and 8 cycles of treatment (Fig 1a).

Responder-& non-responder-specific CpG sites
To determine the effect of treatment on global methylation profiles specific to the pCR group, separate analyses were performed using the 4-way ANOVA model controlling for race, ER status, treatment cycle, and tumor type for responders and non-responders. There were 5,246 CpGs (FDR p < 0.05, Fig 2a) among responders that were DM between baseline and the 4 th treatment cycle, 4,984 (95%) of which are projected to be true discoveries under a 5% FDR. A projection of 4,984 true discoveries among responders is slightly higher than this group's estimate of 4,734 ±22.0 true discoveries from the retrospective power analysis, suggesting that the proportion of CpG sites defined to be "truly positive" (see above) is consistent among responders with the hypothesized rate of 1%. In contrast, there were only 1,201 CpGs (Fig 2b) among non-responders that were DM between baseline and the 4 th treatment cycle, 1,141 (95%) of which are projected to be true discoveries under a 5% FDR. A projection of 1,141 true discoveries among non-responders is more than 78% lower that this group's estimate of 5,222 ±7.2 true discoveries from the retrospective power analysis, indicating that the proportion of "truly positive" CpG sites among non-responders is much lower than the hypothesized rate of 1%. When we inspected the DM CpGs among responders, 67% were localized in the promoter regions that correspond to regions TSS1500 (17%), TSS200 (19%), 5'UTR (17%), 1stExon (14%), and ExonBnd (0.4%) (Fig 3a left). In contrast, 46% of the DM CpGs among nonresponders were localized in the promoter regions, including TSS1500 (12%), TSS200 (11%), 5'UTR (15%), 1stExon (7%), and ExonBnd (0.4%) (Fig 3b, left). Further analysis on the distribution of DM CpGs by CpG-island regions revealed more changes in the CpG islands (60% ,  Fig 3a, right)   baseline and after 4 cycles of NCT treatment, as presented in Table 2. All 10 of the top CpG sites from responders belonged to CpG islands within named genes. In contrast, 8 of the 9 top CpG sites from non-responders belonged instead to "open sea" regions, 3 of which were located in intergenic regions.

Pathway analysis
In order to explore potential molecular events and mechanisms impacted by the neoadjuvant chemotherapy, we applied a statistical-significance threshold of FDR p < 0.05 coupled with a scientific-significance threshold of absolute fold change > 1.5 signifying methylation changes of at least 50% from the therapy. This yielded 651 responder-specific DM CpGs and 355 non-responder-specific CpGs (S1 Fig) that Fig 4a). Top canonical pathways involved in the network were PI3K/Akt signaling and glucocorticoid receptor signaling, with DNMT3B decreased in methylation levels after 4 cycles of NCT while almost all the other genes in the network were increased in the methylation levels after the therapy (Fig 4a). Among non-responders, the top network   Fig 4b). Top canonical pathways involved in the network were glucocorticoid receptor signaling and telomerase signaling. Full list of networks affected from both responders and non-responders are available in the S1 and S2 Tables.

Discussion
Neoadjuvant chemotherapy is administered to patients with operable and locally advanced breast cancers to facilitate breast conservation, or to render inoperable breast tumors viable candidates for surgery. However, the evaluation of clinical response to NCT is delayed until the completion of the therapy. Therefore, biomarkers to predict pCR are urgently needed to avoid unnecessary toxicity to the patient from NCT. Here, we aimed to identify genomic signatures of clinical response among breast cancer patients with very aggressive disease and poor prognosis who underwent NCT using a genome-wide methylation approach. Methylation levels of bromodomain-containing protein 9 (BRD9) in the gene body were significantly different at baseline between responders and non-responders, and were the strongest predictors of clinical response in this study. We also identified 5,246 DM CpG sites among responders, with the majority localized in the CpG islands of promoter regions, while 1,201 DM CpG sites were associated with non-responders. The majority of these sites were localized in the intergenic and open-sea regions. Finally, pathway analysis revealed that the DM CpGs among responders were mapped to canonical pathways including PI3K/Akt signaling and glucocorticoid receptor signaling, with decreasing methylation of DNMT3B, a major gene responsible for methylation, in responders after 4 cycles of NCT. BRD9 protein is an epigenetic "reader" of acetylated lysines on post-translationally modified histone proteins that facilitates gene expression. Although little is known about the function of BRD9, copy number gains of the BRD9 gene in lung cancer patients has been reported and was associated with lung cancer susceptibility [10,11]. It is over-expressed in cervical cancer [12], promotes MYC expression and cell proliferation in acute myeloid leukemia [13]. Protein-engineering and inhibitor studies targeting BRD9 have shown promise in overcoming epigeneticallydefined drug resistance [14], and single or combinatorial effects of BRD9 inhibitors with doxorubicin or carboplatin also resulted in anti-proliferative effects in five malignant rhabdoid tumor cell lines [15]. Risdom et al. revealed that epigenetic plasticity mediated by the chromatin modifier BRD4 drives survival of triple-negative breast cancer cells after targeted therapy treatment [16]. In our study, baseline methylation levels of BRD9 in the gene body were significantly higher among responders compared to non-responders. Gene body methylation has been shown to increase the potential for both somatic and germline mutations [17], and might involve in the recruitment of DNMTs [18,19]. Functionally, hypermethylation of BRD9 reduces the amount of available BRD9 readers, thus preventing recognition of transcription regulators to the acetylated histones, further resulting in the reduced level of transcription. Conversely, lower levels of methylation among non-responders increases the transcription of BRD9, thus facilitating gene transcription. Whether the differences affects structural change of the BRD9 gene, binding affinity of the gene, or expression of the gene will need to be further studied.
While chemotherapy profoundly impacts DNA methylation landscape in cancer patients [20,21], resistance to treatment continues to be a major obstacle for the success of cancer therapy. In our study, increased in the CpG site methylation levels among responders were all mapped to CpG islands including CDC25B, WBSCR27, ADNP, NEK9, COQ2, SOX5, ACPL2, RBM15, TLE2, RB1CC1 (Table 2). Those genes have been shown to have key roles in G2/M cell-cycle progression (CDC25B), methyltransferase (WBSCR27), determination of the cell fate (SOX5), and cell growth (RB1CC1), which the expression levels would expect to be inhibited after treatment. On the other hand, CpG changes among non-responders were mainly mapped to open sea or intergenic regions. Lund et al. reported that the resistance of cisplatin on ovarian cancer cells is associated with loss of hypermethylation in a high number of CpG sites, and are primarily localized in the intergenic regions of the genome [22]. Gene expression can be regulated by the methylation of intergenic regions [23][24][25], which contain key regulatory elements including enhancers, silencers, and noncoding RNAs. Our results support the role of DNA methylation-associated cancer chemotherapy resistance, and identified putative genes that might be associated with drug resistance. Further investigation in larger studies is needed in order to elucidate the complex molecular mechanisms involved in drug resistance.
Patterns of DNA methylation are established and maintained by DNA methyltransferases (DNMTs) [26]. It is well-known that global DNA hypomethylation in cancer is an early epigenetic abnormality in cancer compared to their normal-tissue counterparts [27,28], and may result in the accumulation of epigenetic abnormalities leading to chromosomal instability, increased transcription from transposable elements, and an elevated mutation rate due to mitotic recombination [29]. Thus, DNA methylation has been a target of anticancer therapies in the past few decades [30], with much effort being focused on the development of 5-azanucleosides or non-nucleoside DNMT inhibitors to prevent the translation of oncogenic proteins [31,32]. However, not all breast tumors overexpress DNMTs [33] and little is known about the specificity of DNMTs. In fact, many of the DNMTs are involved in specific and non-specific DNAmethylation machineries [34], suggesting a tight regulation over the DNA methylome to maintain genome and epigenome integrities. Furthermore, DNMT inhibitors have been reported to increase the proliferation of pluripotent stem cell [35] or reprogramming through epigenetic modifications [36], as well as to have immune-suppressive effects [37]. In our study, NCT increased the methylation levels for 4,323 CpG sites and decreased methylation levels of 923 CpGs among responders, potentially through the decreased methylation (increased gene expression) of DNMT3B as indicated in Fig 4a, suppressing the expression of certain resistance genes thus leading to pathological complete response. On the other hand, methylation levels of 721 CpGs were increased among non-responders and 480 CpGs were decreased in methylation levels (Fig 2), mostly among the open sea regions. No significant change in methylation was observed for DNMT3B among non-responders. Further studies are needed in order to fully understand the function and mechanism underlying the observed differences.
Other attempts at identifying epigenetic markers including targeted approach by examining gene-specific methylation markers including the use of PITX2 methylation to predict anthracycline-based chemotherapy for triple-negative breast cancer and high-risk breast cancer [38,39], to examine changes in the methylation levels in estrogen receptor-α promoter regions among triple-negative breast cancer patients [40], as well as global approach to identify HSD17B4 methylation as predictor of HER2-positive breast cancer to trastuzumab and chemotherapy [41]. Another global approach by Gampenrieder et al. reported the use of DNAmethylation signatures in archival formalin-fixed paraffin-embedded (FFPE) specimens to predict Bevacizumab efficacy in HER2-negative metastatic breast cancer by the Illumina Infinium HumanMethylation450 BeadChip [42]. Albeit using the older version of the beadchip, the study identified 9-gene and 3-gene DNA-methylation signatures that were predictive for responders with longer progression-free survival [42]. The study focused on patients with metastatic breast cancers treated with different chemotherapy backbones (taxane or capecitabine) alone or in combination with bevacizumab, which may have different DNA methylation profiles than patients with locally advanced breast cancer treated with neoadjuvant therapy containing bevacizumab in our study. However, their study has a significant age difference (median age 61) compared to ours (median age 43), and did not account for racial differences in the analysis. , which does not contain the BRD9 probe we identified from the study. Our samples were collected in 2005-2008 without RNA stabilization reagent so we were unable to validate the results using gene expression assays. However, we do want to acknowledge that the data from Illumina Infinium HumanMethylation BeadChips, especially the Illumina EPIC BeadChips is highly robust and reproducible and is a significant improvement over the HM450 array for human studies [3,49]. Despite limitations in validation and sample size, this study highlights the use of methylation signatures with the use of stringent statistical methods to correct for multiple comparisons to predict clinical response of breast cancer patients to NCT. In this regard, a larger-scale effort to define the usefulness of the markers may have the potential to greatly expand the clinical utility and to be evaluated in future clinically relevant studies.