Gene expression in "young adult type" breast cancer: a retrospective analysis.

BACKGROUND
Young women with breast cancer experience inferior outcome and commonly manifest aggressive biological subtypes. Data is controversial regarding biological differences between breast tumors in young (diagnosed at <40 years of age) versus older women. We hypothesize there may be age-related expression differences in key genes for proliferation, invasion and metastasis within and across breast cancer subtypes, and that these differences correlate with outcome.


METHODS
Using clinically-annotated gene expression data from 778 breast tumors from three public databases, we compared clinico-pathologic characteristics, mRNA expression of 17 selected genes, and outcome, as a function of age (< 40 years vs. ≥ 40 years).


RESULTS
14 of 17 genes were differentially expressed in tumors of young vs. older women, 4 of which persisted after correction for subtype and grade (p ≤0.05). BUB1, KRT5, and MYCN were overexpressed and CXCL2 underexpressed in young women. In multivariate analysis, overexpression of cytokeratin genes predicted inferior DFS only for young women. Overexpression of ANGPTL4 strongly predicted inferior DFS in basal but not HER2-enriched tumors in young women. Overexpression of cytokeratin genes and MYBL2 and low SNAI1 expression correlated with inferior DFS in HER2-enriched tumors in younger women. Kaplan-Meier analysis within the basal and HER2-enriched subgroups showed that overexpression of cytokeratin genes was associated with inferior DFS for young, but not older women.


CONCLUSIONS
This preliminary study reveals age- and subtype-related differences in expression of key breast cancer genes for proliferation, invasion and metastasis, which correlate with prognostic differences in young women and suggest targeted therapies.


IntroductIon
Breast cancer is the most common malignancy in young women aged 15-39 years, and young age is an independent risk factor for death from breast cancer [1]. Young women tend to present with higher grade, biologically-aggressive tumors (i.e. basal and HER2enriched subtypes) compared to older women [2]. Women under 40 years of age with early stage breast cancer are 40% more likely to die of their disease than older counterparts [3]. While clinicopathologic differences point to underlying biologic differences between the breast tumors arising in younger versus older women, prior studies have yet to document age-related changes in global gene expression beyond those attributable to increased frequency of aggressive subtypes in younger patients [4]. www.impactjournals.com/oncotarget There is currently limited data to explain why a higher percentage of younger versus older women develop biologically aggressive breast cancer subtypes, nor why young women with early stage disease have disproportionately higher mortality compared to older women. In this study we selected a candidate gene approach to address this question, analyzing the expression of well-known breast cancer genes with strong potential for prognostic significance as a function of age. A notable study describes a 5-gene classifier that holds prognostic significance [5]. This classifier takes into account protein expression of epithelial growth factor receptor (EGFR) and cytokeratin (CK) 5/6 by immunohistochemistry, in addition to expression of estrogen receptor (ER), progesterone receptor (PR) and HER2 (human epidermal growth factor receptor 2). Patients with triple negative breast cancer in the "core basal" subgroup (whose tumors lacked expression of ER, PR and HER2, yet expressed EGFR and/or CK 5/6) had inferior outcome following anthracycline-based therapy regimens as compared to patients whose tumors lacked expression of all 5 biomarkers. The "core basal" subtype was more commonly seen among younger women under 40 years of age as compared to older women. Specifically, 18% of breast cancers were "core basal" (71/380) among patients aged ≤40 compared to 7% (265/366) among patients aged >40 [6] [7]. Thus, we hypothesized that there may be age-specific differential expression of key genes relating to breast cancer proliferation, invasion or metastases, including CK 5/6, EGFR, and others, across and within breast cancer subtypes, and that these differences may hold prognostic importance.

Age-specific clinical characteristics
Of the 778 patients included in this analysis, 13% (n = 103) were aged < 40 years (24-39 years of age, with median age 36) while 87% (n = 675) were aged 40 years or older (40-93 years of age, with median age 52) ( Table 1). A higher proportion of younger women were diagnosed with HER2-enriched and basal breast cancers when compared to older women (23.3% versus 17.2%   [15], young women were also more likely than older women to be diagnosed with grade 3 tumors (OR = 4.05, p = 0.0002), while they were less likely to be diagnosed with ER positive as compared to ER negative breast tumors (OR = 0.51, p = 0.003). More older women received endocrine therapy (with or without chemotherapy), likely a result of a greater proportion of older (vs. young) women being diagnosed with endocrine sensitive breast cancer. Rates of receipt of chemotherapy as a single modality of treatment were similar between age groups (p = 0.23)   KRT5 3852 Overexpression of cytokeratin 5/6 associated with inferior outcome in basal subtype [5] KRT6A 3853 Overexpression of cytokeratin 5/6 associated with inferior outcome in "core basal" subtype [5] KRT6B 3854 Overexpression of cytokeratin 5/6 associated with inferior outcome in "core basal" subtype [5] EGFR 1956 Overexpression associated with inferior outcome in "core basal" subtype [5] MYBL2 4605 Involved in cell proliferation and survival. Overexpression common in high grade, node negative breast cancer and associated with poor response to therapy and inferior outcome. Gene that appears most often in microarray classifiers.

Analysis of single gene expression by age
We analyzed the expression of 17 genes key to breast cancer proliferation, invasion, and metastasis as a function of age (Table 2). Before adjustment for subtype and tumor grade, 14 of the 17 genes were differentially expressed in young compared to older patients (p < 0.05). Thirteen of the fourteen genes were overexpressed, while one gene (CXCL2) showed decreased expression in young versus older women (Table 3, Univariate Model). Correction for tumor subtype and grade was performed in a multivariate regression model and 4 genes remained differentially expressed in young versus older women ( Table 3

Association between gene expression and DFS in univariate and multivariate models
Initially, the association between DFS and gene expression of the 17 selected genes was performed for young and older patients, respectively (Table 4 and Supp. Table 1). In the young group, significant associations were found between DFS and the following genes: ADM, ANGPTL4, AURKA, KRT6A, EGFR, MYBL2 and VEGFA (Table 4, Univariate Model, all p < 0.05).
In the older group, mRNA expression levels of ADM, ANGPTL4, AURKA, EGFR, MYBL2, and VEGFA were also associated with DFS; however, expression of KRT6A was not associated with outcome in women over age 40 In the more stringent analysis of this data correcting for multiple gene comparisons, interestingly ANGPL4 and VEGFA maintained significance in the younger, but not the older group of patients (adjusted p < 0.05).

Association between gene expression and DFS for HER2-enriched and basal breast cancer subtypes in univariate and multivariate models
Recognizing the aggressive nature and increased incidence of HER2-enriched and basal breast cancer among younger women, we performed a similar analysis within these two breast cancer subtypes. Within the basal subtype, overexpression of ANGPTL4 (HR 1.5, CI 1.17-1.96, unadjusted p = 0.002, adjusted p = 0.034) was significantly associated with DFS when correcting for grade in multivariable analysis (

Disease-free survival analysis using the kaplan meier method
To further interrogate the association of gene expression and DFS in basal and HER2-enriched subtypes known to frequently occur in young women, we categorized expression levels for each gene as high or low (cut at the median) and created Kaplan Meier survival plots based on high vs. low gene expression. For four genes in the data set, ANGPTL4, KRT6A KRT6B and SNAI1, there was a significant association between gene expression and DFS by the Kaplan Meier method in young women. None of the other genes were significant for the younger group.
Overexpression of ANGPTL4 was associated with worse DFS for both younger (p = 0.006, HR 4.76) and older patients (p = 0.035, HR 1.88) with basal breast cancer. This association was not seen in patients with HER2-enriched breast cancer (young, p = 0.076, HR 2.98; older p = 0.27, HR 1.37) (Figure 2). Overexpression of KRT6A was significantly associated with worse DFS in younger patients with both basal and HER2-enriched breast cancer (p = 0.038, HR 2.85; p = 0.032, HR 3.6, respectively). There was no association between KRT6A and DFS among older patients with either basal or HER2enriched breast cancer (p = 0.22 and p = 0.88, respectively) ( Figure 3). Overexpression of KRT6B was associated with worse DFS in younger patients with HER2-enriched breast The bold genes are significant at nominal 0.05 significance level and the bold italic genes are significant at 0.05 adjusted significance level in multivariate model. cancer only (p = 0.01, HR 4.1); an inverse relationship was seen in older patients in this group (p = 0.14, HR 0.65) ( Figure 4).

dIscussIon
In accordance with previous reports, our analysis reveals a higher frequency of high grade and endocrine insensitive breast tumors in young women as compared to older women, as well as age-related differences in the relative frequency of breast cancer subtypes by age [15], Specifically, we found an increased frequency of HER2-enriched and basal breast cancer subtypes in young as compared to older women. Despite similarities in receipt of systemic chemotherapy, young women with HER2-enriched and Luminal B breast cancer had inferior outcome compared to older women within the same subtype. We noted inferior survival for young women in the HER2-enriched subgroup, as well as a trend toward poor outcome in young women with basal and luminal B breast cancer. Based on the fact that patients in two of the three data sets (NKI295 and GSE4922) were diagnosed before 2003, it is likely that the majority of HER2-positive patients in this study were treated in the pre-Herceptin era [8] [9], and did not receive targeted therapy. It is possible that, with the advent of Herceptin, newer data sets may show less profound age-related differences. However, the baseline inferior survival of young women with HER2positive disease is noteworthy. We also identified agerelated differences in the expression of several key genes associated with proliferation, invasion and metastasis, some of which predicted inferior DFS in younger women. In univariate and multivariate modeling (accounting for subtype and grade), overexpression of ANGPTL4, MYBL2 and VEGF were associated with inferior DFS for both the young and older age groups.
For three genes in the data set, KRT5, KRT6A, KRT6B, there was a significant association between gene expression and inferior prognosis unique to young women (with overexpression of EGFR of borderline significance). Overexpression of ANGPTL4 was associated with inferior outcome for young women with basal breast tumors; the same held true for the keratins (KRT5, KRT6A, and KRT6B) among young women with HER2-enriched breast cancer. Kaplan-Meier survival analysis illustrates inferior DFS for young, but not older, patients with the HER2-enriched breast cancer over-expressing KRT6A and KRT6B. Taken together, this data suggests that the keratin genes may be involved in young adult cancers -beyond that of the basal subtype -and that overexpression of these genes may negatively impact outcome for women with young adult breast cancer. Finally, our analysis points toward ANGPLT4 as a gene whose overexpression may be associated with poorer outcome among younger women with aggressive basal breast cancer.
Several previous studies have recently reported biological differences in the breast cancers of young women that extend beyond those attributable to agerelated variation in subtype distribution. Using bacterial artificial chromosome (BAC) array comparative genomic hybridization (aCGH), Thomas and Leonard [6] identified preliminary evidence of predictable chromosomal copy number differences between grade 3, node negative breast tumors of young (< 45 years of age) vs. elderly women (>70 years of age). Benz (2008) [7] noted differences in invasiveness and angiogenesis in tumors of young vs. older women, suggesting age-related differences in epigenetic regulation. In 4,000 clinically-annotated breast cancer cases, those arising in older women were less aggressive and grew more slowly than those of younger women, even after controlling for both grade and expression of hormone receptors and HER2. Tumor protein extracts were analyzed by immunoassay for expression of 11 biomarkers selected to correlate with proliferation, angiogenesis, and endocrine dependence. Notably, while expression levels of uPA and VEGF, markers of angiogenesis and invasiveness, did not differ in an age-specific manner, the clinical impact of expression levels differed by age. For women with nodenegative, ER-positive tumors, high expression levels of either of the two genes gene correlated with inferior DFS only for young patients < age 45, but not for older patients >70. This observation suggests an age-specific response among biologically similar tumors. Finally, Azim et al. [16] evaluated the prognostic significance of previously The bold genes are significant at nominal 0.05 significance level and the bold italic genes are significant at 0.05 adjusted significance level in multivariate model.
published gene signatures related to stroma, immunity and proliferation in breast cancers arising in young women (< 40 years of age) compared to older women. They found that stromal gene signatures had prognostic value only for young women with ER-negative, HER2-negative breast cancer, but not for older women, suggesting a role for tissue microenvironment in the pathogenesis of young adult breast cancer. Compared to breast cancers of older women, young adult breast cancers were relatively enriched for immature mammary cell populations and growth factor signaling, with relative downregulation of genes related to apoptosis. The authors concluded that these features of young adult breast cancers could potentially promote aggressive tumor growth. A difference in methodology between this study and ours is that, in this study, subtype was defined by a 3-gene classifier (ESR1, ERBB2 and AURKA), whereas our study classified tumors based on the PAM50. The subgroup of "core basal" tumors overexpressing EGFR and cytokeratin 5/6 is particularly prevalent in young women under 40 as compared to older women. Previous studies have noted poor outcome associated with high expression of cytokeratin 5, 6A and 6B in basal tumors [5], but to our knowledge this is the first report that the association between cytokeratin expression and inferior DFS may be an age-related finding, present in young women with HER2-enriched as well as basal tumors, but not in older women with breast cancer. Thus, it is possible that high expression of cytokeratins 5 and 6 (CK 5/6) may be a more generalizable indicator of poor outcome in young breast cancer patients. CK 5/6 expression in primary breast tumors has previously been identified in association with the development of brain metastases or metastases at multiple sites [17]. Taken together, these data suggest the possibility that CK 5/6 may be involved in the clinically aggressive behavior of breast cancers in young adults. Interestingly, the same did not hold true for EGFR (another gene that defines the "core basal" subtype), which in our study had no impact on DFS in HER2-enriched breast cancers, but did approach significance for basal breast cancers in young women. When corrected for subtype and grade, our multivariant analysis shows that high expression of ANGPTL4, a potential druggable target, strongly predicts inferior DFS in young but not older women with basaltype breast cancer. Kaplan Meier survival curves suggest an association between high expression of ANGPTL4 and inferior DFS for both the basal and HER2-enriched subtypes. While this is correlation is present in both the young and older age groups, it is more pronounced in the young patients. ANGPTL4 is a secreted matricellular protein that is broadly expressed in many types of malignant tumor and is associated with poor prognosis in oral cancer [18]. ANGPTL4 plays a critical role in cancer growth and progression and specifically contributes to breast cancer metastasis by protecting endothelial cells from apoptosis promoting angiogenesis, and facilitating cell migration [18] [19]. High expression of ANGPTL4 in primary breast tumors is strongly associated with metastasis to the lung and has also been implicated in brain metastasis in breast cancer [20] [21]. It is well-recognized that patterns of metastatic spread differ by breast cancer subtype, with the basal subtype highly prone to brain and lung metastases [22] [23]. The role of ANGPTL4 in breast cancer metastasis to both lung and brain makes it an interesting potential druggable target. As a direct target of HIF-1, ANGPTL4 is a candidate for clinical intervention using digoxin, which inhibits HIF-1 and has been shown to decrease tumor growth and lung metastasis breast cancer cell lines and xenografts [24] [25]. The use of either general angiogenesis inhibitors or specific agents against ANGPTL4 may prove particularly beneficial for young adult patients with basal breast cancer, a high risk population in need of more effective therapeutics.
We recognize that our study had several limitations. Survival analyses were impacted by the fact that the databases include limited information regarding the specifics of cancer therapy. Our survival analyses therefore, are exploratory in nature and will require DFS analysis between high expression and low expression patients was performed for young patients with basal-like, older patients with basal-like, young patients in Her2-enriched and older patient in Her2-enriched breast tumors, respectively. www.impactjournals.com/oncotarget validation in larger, population-based studies. Instead of a genome-wide exploratory analysis, we selected a smaller number of genes to analyze based on published reports suggesting a potential role in the development of breast cancer in young women. Focus on candidate genes identified through a literature search may decrease the potential for bias due to multiple testing that is inherent in comparative studies of global gene expression. We also recognize that there is no single best way to explore the biology of young women's breast tumors. In this study, we took a similar approach to Azim et al [16]. While our previous large scale analysis of gene expression did not reveal striking age-related differences [4], targeted analysis of genes relating to proliferation, invasion and metastasis within breast cancer subtypes suggests significant age-related differences in several key genes (i.e. ANGPTL4 and cytokeratins 5 and 6) that may hold prognostic significance.
conclusIons Taken together, these data are preliminary, yet provocative, and should be validated in future studies. If validated, this information may prove useful to young women and their physicians as they make treatment decisions for early stage and/or advanced breast cancer, especially as genotyping becomes more commonplace in clinical practice. Remaining unanswered questions include (1) the biological basis for the preponderance of aggressive subtypes of breast cancer arising in younger women and (2) the role of the microenvironment in the development of young adult breast cancers -both of which are research subjects worthy of further pursuit.

Patient selection and breast carcinoma samples
Microarray data from three publically-available, clinically-annotated breast cancer data sets, NKI295 [8] (n = 259; normal-like tumors were excluded from the analysis), GSE4922 [9] (n = 205) and GSE20624 [4] (n = 314), were used for the analysis. At the time of this analysis, these datasets were selected based on (1) their inclusion of a substantial number of patients under the age of 40, (2) our ability to merge platforms to conduct the analysis, and (3) their inclusion of all 17 genes of interest on the respective platforms. NKI295 and GSE20624 data sets were generated by two-channel Agilent microarray while GSE4922 data was based on Affymetrix one-channel microarray. We used the normalized data from original studies, which has been row (gene) median centered and column (sample) standardized. Batch correction was performed on the three data sets (n = 778) using an empirical Bayes approach [10]. A total of 778 clinicallyannotated breast tumor samples from the three data sets were available for analysis. All three data sets included information on age, breast cancer subtype, hormone receptor status (ER/PR), tumor size, tumor grade and nodal status. None of the data sets contained information about familial risk of breast cancer. Two of the three data sets (NKI295 and GSE20624) also contained information on treatment. (Table 1) Clinicopathological characteristics and breast cancer subtype assignment The following clinicopathologic variables were available for analysis: ER status (positive/negative), tumor size (T ≤ 2 cm, T > 2cm), tumor grade (1, 2, 3), lymph node status (positive/negative), treatment (chemotherapy [yes/no], chemotherapy and endocrine therapy [yes/no], endocrine therapy only [yes/no], or no systemic therapy). In addition, the 50-gene Prediction Analysis of Microarray (PAM50) classifier was applied to the data and classified breast tumors as Luminal A (LumA), Luminal B (LumB), HER2-enriched, and basal [11]. For the purposes of this analysis, the Normal-like classification was not included. Fisher's exact test implemented in R (http://www.rproject.org/) was used to evaluate the association between each clinicopathological variable and the age groups (< 40 years and >=40 years). Two-tailed P value < 0.05 was considered to be statistically significant.

Selection of candidate genes
We conducted a PubMed search for genes that are associated with poor outcome in breast cancer, focusing on genes implicated in breast cancer proliferation, invasion, metastasis or patient survival. Search terms included: breast cancer gene expression and metastasis, breast cancer gene expression and death, breast cancer and early onset. The 17 genes selected, along with their biologic functions, are listed in Table 2.

Differential analysis of single gene expression between age groups
Patients were categorized into two groups: young (aged < 40 years) and older (aged ≥ 40) at breast cancer diagnosis. Age-specific differences in single-gene mRNA expression values were tested using linear regression models (lm function in R). The analysis was conducted at both univariate and multivariate levels for each gene. In the multivariate model, we adjusted for significant clinical variables to include tumor grade and tumor subtype (Table 1). Although the estrogen receptor was a significant clinical variable, this variable was not included in the multivariate model as it is known to correlate with breast cancer subtype. The corresponding p-values from univariate and multivariate models were adjusted using the multiple testing procedure developed by Benjamini and Hochberg [12].