Main

Breast cancer is the most common cancer in women worldwide. Today, the majority of patients with operable breast cancer are offered systemic therapy to reduce the risk of distant recurrence. Preoperative chemotherapy reduces the tumour burden of operable and locally advanced breast tumours (clinical stage IIIB–C) and thereby improves resectability, with the same survival effect as postoperative chemotherapy (van der Hage et al, 2001; Wolmark et al, 2001; Rastogi et al, 2008; de Azambuja et al, 2014). In addition, preoperative treatment serves as an in vivo chemosensitivity test, allowing for early evaluation of the efficacy of chemotherapy (Schott and Hayes, 2012).

Clinical response rates to preoperative chemotherapy range from 60 to 80%, whereas pathologic complete response (pCR) rates are 10–20% (Fisher et al, 1998; Smith et al, 2002), although both differ among tumour subtypes. According to a previous report, the pCR rates for hormone receptor-positive (HR+)/human epidermal growth factor receptor 2-negative (HER2−), HR+/human epidermal growth factor receptor 2-positive (HER2+), hormone receptor-negative (HR−)/HER2+, and HR−/HER2− subtypes were 13%, 19%, 48%, and 29%, respectively (Iwata et al, 2011). Patients who achieve pCR have a better prognosis compared with those who do not (van der Hage et al, 2001; Wolmark et al, 2001; Rastogi et al, 2008). Studies investigating the factors associated with tumour response to chemotherapy have shown that markers of tumour cell proliferation, including Ki-67 staining, histologic grade, negative oestrogen receptor (ER) status, and HER2 overexpression, are significantly associated with the pCR rate (Petit et al, 2004; Dowsett et al, 2006; Andre et al, 2008; Nishimura et al, 2010). However, these results come with some degree of controversy and little indication of their clinically applicable predictive value (Burcombe et al, 2005). The biological mechanisms that influence tumour responsiveness in the preoperative setting, including tumour recurrence, are not clearly understood.

Gene expression profiling studies in human tumours have provided new insights into the genes and pathways that contribute to tumourigenesis and the gene expression signatures that are prognostic of patient outcome. Previous studies aimed at discovering genes associated with breast cancer recurrence have uncovered several genes associated with cellular proliferation, resulting in the identification of genes associated with poor patient prognosis (Huang et al, 2003; Dai et al, 2005). These gene expression data have the potential to aid the determination of accurate, individualised prognosis, for example, through Agendia’s MammaPrint and Genomic Health’s 21-gene Oncotype Dx systems (van 't Veer et al, 2002; Paik et al, 2004; Paik et al, 2006). However, these approaches are based on node-negative, HR-positive early breast cancers, and no studies thus far have clearly identified gene signatures predicting both pCR and disease-free survival (DFS) from one cohort.

Here, we aimed to analyse the clinicopathological and gene expression profiles to predict pCR and DFS in early breast cancers managed by preoperative chemotherapy, using data from long-term follow-up of two prospective studies.

Materials and methods

Patients and samples

We synthetically analysed the data of patients from our two previous consecutive prospective phase II studies in the preoperative chemotherapy setting, ‘Trial A’ (Tamura et al, 2011) and ‘Trial B’ (Ando et al, 2014). The two studies included patients with HER2+ and/or HER2−, and HR+ and/or HR− tumours. The trials had similar eligibility criteria; in brief, patients had histologically confirmed, previously untreated, unilateral, non-inflammatory invasive breast cancer. Histologic confirmation of invasive cancer was performed by core needle biopsy, and HER2− disease was assigned a score of 0 or 1+ based on immunohistochemistry or HER2 gene copy number: a chromosome 17 ratio of <2.0 by fluorescence in situ hybridisation (Wolff et al, 2007). Patients had clinical stage IIA–IIIC primary measurable disease. Other requirements included age 18 years, Eastern Cooperative Oncology Group (ECOG) performance status (PS) score of 0–1, and adequate organ function (white blood cell count 4000/μl, platelet count 100 000/μl, haemoglobin concentration 9.0 g dl−1, serum bilirubin 2.0 mg dl−1, aspartate aminotransferase and alanine aminotransferase 100 IU l−1, serum creatinine institutional upper limit of normal range, PaO2 60 mm Hg, and baseline left ventricular ejection fraction >50%).

Patients received 5-fluorouracil/epirubicin/cyclophosphamide (500/100/500 mg m−2) q3w × 4 cycles, followed by paclitaxel (PTX) (80 mg m−2) q1w × 12 cycles or docetaxel (75 mg m−2) q3w × 4 cycles or, if HER2+, PTX/trastuzumab (loading dose 4 mg kg−1, maintenance dose 2 mg kg−1) q1w × 12 cycles.

This study was conducted according to a protocol approved by the institutional review board and independent ethics committee, and informed consent was obtained from all patients for the use of biopsy specimens and the analysis of clinical information.

Clinical statistical analysis

Complete response rate was defined as no invasive residual tumour in breast and nodes, with noninvasive breast residuals allowed (ypT0/is, ypN0), a definition commonly used by MD Anderson Cancer Center, the Austrian Breast and Colorectal Cancer Study Group, and the Neo-Breast International Group (Green et al, 2005). Disease-free survival was estimated from the date of induction of adjuvant chemotherapy to the date of relapse or death from any cause (only relapses were considered events) using the Kaplan–Meier method. Potential predictive factors for pCR and recurrence were recorded, including patient age (35 vs <35 years), tumour stage (IIa–IIB vs IIIA–IIIC), subtype (HR+/HER2− vs HR+/HER2+, HR−/HER2+, and HR−/HER2−), nuclear grade (1 vs 2–3), pCR rate (pCR vs non-pCR), and lymph node involvement. Nuclear grade, pCR, and lymph nodes were diagnosed with surgical specimens by two pathologists. A logistic regression model was used to estimate the odds ratio comparing the odds of pCR. Cox proportional hazards regression was used to investigate the prognostic factors for DFS.

Microarray process and statistical analysis

The mRNA was extracted from fine-needle biopsy samples performed at diagnosis before preoperative chemotherapy. cDNA was constructed from the mRNA of the breast cancer tissues by standard RT procedures. The probes were prepared by immunofluorescence labelling with cDNA and hybridised to chip arrays containing 54K probe sets (Affymetrix U133 plus 2.0). The data were processed by the robust multiarray average algorithm (Irizarry et al, 2003; Gautier et al, 2004) using GeneSpring ver. 12.6 (Agilent Technologies, Santa Clara, CA, USA). These processed values were corrected for batch effects using the ComBat function in the R package sva (Leek et al, 2012). Institution, age, stage, menopausal status, subtype, grade, ductal carcinoma in situ, pCR, and event DFS were used as covariates to correct for batch effects.

To identify a genomic signature of DFS using the data set after preprocessing, the 54 613 probes were filtered down to 104 probes, which satisfied the P-values of both the univariate Cox regression analysis and the significance analysis of microarrays (SAM) (Tusher et al, 2001) tests, which were <0.01. For the 104 probes, an iterative backward elimination feature selection procedure was applied using three-component partial Cox regression analysis (Li and Gui, 2004), where the partial Cox coefficients of the proteins were used for ranking (Ahdesmaki et al, 2013). During each iteration, the lowest ranking proteins were discarded after calculating the C-index (Harrell and Frank, 2010), and this was continued until the two probes remained. The C-index was calculated for each number of probes sets (i.e., from 104 to 2 of the length). In this study, we determined the length with highest C-index as the optimal one of the genomic signature.

Next, we developed the final genomic signature including these probes based on the partial Cox regression analysis. The DFS event risk based on the developed signature was quantified by the risk score that is defined as the linear combination of the signature probe values multiplied by their corresponding partial Cox model coefficients, first subtracted by the mean values of their probes. That is, the risk score for patient can be written as

where L is the number of optima length and βj is the coefficient of the partial Cox regression model for probe j. The risk scores have by definition a sample mean of zero, and indicate high risk of DFS events for the larger value.

Finally, we calculate the internal accuracy of the final genomic signature. The survival curves for DFS between the two groups of the patients with 0 of risk score and patients with <0 were compared by the Kaplan–Meier method and log-rank test. The hazard ratio and its 95% confidence interval for the risk score dichotomised by zero were also estimated by the ordinary Cox regression analysis.

Additionally, we also identified the genes that were strongly correlated with pCR. Similar to the analyses for DFS, the 54 613 probes were filtered down to 363 probes that satisfied the P-values of both the univariate logistic regression analysis and the SAM tests, which were <0.001. For the 363 probes, an iterative backward elimination feature selection procedure was applied using the ridge regression analysis (Friedman et al, 2010). Based on the area under the receiver operating characteristic curve (AUC), we determined the probes with highest AUC as the strongly correlated gene set.

All statistical tests were two sided. These analyses were performed using the SAS software (version 9.3; SAS Institute Inc, Cary, NC, USA) and the original R codes.

Average-linkage hierarchical clustering of genes and arrays was performed. We also conducted the clustering analysis using probes significant to the following gene groups: A, cell cycle regulator; B, signal; C, Rho family-related protein; D, blank; E, angiogenesis-related protein; F, growth factor; G, cytokine; H, apoptosis factors; I, DNA transcription factors/damage response, repair, and combination; J, metabolism/translation/protein turnover; K, detoxification enzyme; L, transporters/nucleocytoplasmic transporters/symporters and antiporters; M, cytoskeletal proteins; N, hormone-related and receptors; antibody-dependent cell-mediated cytotoxicity (ADCC), ADCC activators; Y, glyceraldehyde-3-phosphate dehydrogenase as a positive control.

Apart from the above-mentioned analysis, we re-identified the probes that correlated with pCR. To this end, we used the cohort from Trial A as a training set, and the cohort from Trial B as a validation set. Similar to the above-mentioned analysis, we first filtered the probes using the χ2 test to remove probe sets that did not display significant variation in expression across arrays concerning pCR, a method used by the Cancer and Leukaemia Group B study for acute myeloid leukaemia (Marcucci et al, 2008). As the results, a total of 8700 probes met the filtering criteria and were included in the next step, where Wilcoxon’s P-values by univariate and permutation tests were applied. The top-ranked probes were verified using the validation set based on the support vector machine with a linear kernel (Furey et al, 2000). The number of genes used in the formula was determined by a fivefold cross-validation. The HER2 status was also included in the formula.

Results

Clinicopathological features predictive of pCR and DFS

Between July 2007 and December 2010, 122 patients were enrolled in the two consecutive prospective studies. Of these patients, 107 underwent long-term follow-up study. The median follow-up time from the start of preoperative chemotherapy was 64.1 months (range 14.3–106 months). Table 1 summarises the patient characteristics. The median age was 51 (23–75) years. The respective numbers of patients with ECOG PS of 0 and 1 were 115 and 7; with tumour stages IIA, IIB, IIIA, IIIB, and IIIC were 30, 57, 20, 14, and 1; and with tumour status HR+/HER2−, HR+/HER2+, HR−/HER2+, and HR−/HER2− were 51, 18, 29, and 24. The characteristics and results of the 107 patients were comparable to those of the original cohort (122 patients). The pCR rate of the 107 patients was 28% (HR+/HER2− : HR+/HER2+ : HR−/HER2+ : HR−/HER2−=2.1% : 38% : 70% : 35%). Table 2A depicts the multivariate analysis with clinicopathological characteristics of predictive factors for pCR. The pCR rate was significantly higher in HR+/HER2+, HR−/HER2+, and HR−/HER2− breast cancers compared with HR+/HER2− cancers (P=0.004, P<0.001, and P=0.007, respectively). The 5-year DFS for all subtypes was 77.4%. The survival curves for each subtype are shown in Figure 1. The HR−/HER2− subtype had a significantly poor prognosis (P=0.0045). Multivariate analysis revealed that non-pCR, age <35 years, HR−/HER2− subtype, and axillary lymph nodes 4 were significant poor prognostic factors (P=0.03, P=0.005, P<0.001, and P<0.001, respectively), as shown in Table 2B.

Table 1 Baseline characteristics of patients from the two preoperative trials
Table 2A Factors predictive of pCR assessed by multivariate logistic regression model (n=107)
Figure 1
figure 1

Kaplan–Meier disease-free survival curves of patients stratified by subtype ( n =107). The median follow-up time was 64 months. The 5-year DFS% (95% confidence interval (CI)) were as follows: HR+/HER2−, 79.9% (64.8, 89.0); HR+/HER2+, 83.1% (47.2, 95.5); HR−/HER2+, 86.7% (64.3, 95.5); and HR−/HER2−, 56.5% (34.3, 73.8).

Table 2B Factors predictive of DFS assessed by multivariate Cox regression model (n=107)

Microarray analysis

Primary breast cancer tissues were obtained from the 107 patients by fine-needle biopsy; 78 (73%) samples contained sufficient mRNA for cDNA microarray analysis. Results of the gene signature analysis identified an 8-gene expression profile predictive of pCR and a 17-gene expression profile most associated with DFS, as shown in Table 3A and B. With the DFS 17-gene signature, which includes the apoptosis-related cysteine peptidase caspase-8 encoding gene (CASP8), patients were classified into a low-risk group (n=45) and a high-risk group (n=33) according to the risk score determined by the partial Cox regression model (Figure 2). A Cox proportional hazards regression to investigate the prognostic factors, including the gene expression profiles of low- and high-risk factors, for DFS showed that the gene profiling was the strongest factor (Table 4).

Table 3A Eight genes contained in the pCR discriminating profile (n=78)
Table 3B Seventeen genes contained in the DFS discriminating profile (n=78)
Figure 2
figure 2

Patients were classified into low-risk ( n =45, bold line) and high-risk ( n =33, dotted line) groups with the 17-gene signature predicting DFS by partial least-squares Cox regression. The hazard ratio was 67.8 (95% confidence interval (CI), 3.70–1240), P=0.0045.

Table 4 Factors predictive of DFS assessed by multivariate Cox regression model in patients from gene analysis cohort (n=78)

Based on gene clustering analysis, the breast cancers were classified according to the following immunohistochemistry subtypes: luminal A and B, HER2-enriched, and triple-negative breast cancer (TNBC). Triple-negative breast cancer was further divided into two clusters most likely indicating basal-like and claudin-low intrinsic subtypes, as previously reported by Sorlie et al (2001), indicating the high quality of the representative method (Supplementary Data and Figure 1). The genes used in the clustering analysis is shown in Supplementary Table 1. When the clustering analysis using significant gene groups of A to N and Y was performed according to HER2 status, HER2+ breast cancers were successfully clustered using the B group gene probes, whereas HER2− breast cancers were clustered appropriately with the gene probes of the N, J, I, and H groups (data not shown).

In addition, using the training set, we identified the top-ranked gene probes associated with pCR (Wilcoxon’s P-value <0.01). On the basis of the 100 top-ranked genes (Supplementary Table 2), the training set was also used for model selection and resulted in the creation of a gene model for the HER2− breast cancers (HR+/HER2− and HR−/HER2−) (Supplementary Figure 2). This analysis included 49 samples, which included 35 from the training set and 14 from the validation set. The discrimination markers predicting pCR were analysed using the training and validation sets, and the lowest test error of 7% was observed when the number of markers fitted in the model was four; these included HER2− status and the following three genes: promyelocytic leukaemia protein (PML), tryptophanyl-tRNA synthetase (TrpRS), and choline kinase alpha (CHKA) (Supplementary Figure 3). A similar analysis including the HER2+ subset failed to show a well-fitted model.

Discussion

Breast cancer is heterogeneous in terms of prognosis and response to chemotherapy, even among known intrinsic subtypes. Results from our clinicopathological data revealed that HER2+ and HR−/HER2− subtypes were predictive of pCR. A significantly poor DFS was observed in the HR−/HER2− subtype and in lymph node-positive tumours. These results are comparable with results previously reported such as a study by Liedtke et al (2008); therefore, our cohort of two combined trials is representative of the general cancer population. Liedtke et al (2008) found that the TNBC subtype had a higher pCR rate than the non-TNBC subtype, but a significantly decreased survival. This was explained by the heterogeneity of the TNBC subtype, as some patients achieved pCR with a good prognosis, whereas the majority did not achieve pCR and had a significantly worse prognosis than other subtypes. Lehmann et al (2011) proposed that TNBC could be classified into seven molecular subtypes, among which the pCR rate differed, as shown by Masuda et al (2013). These data, together with our own, suggest heterogeneity even among the known breast cancer subtypes. The finding that four or more positive axillary lymph nodes constituted a poor prognostic factor correlates with historical data showing that lymph node positivity is the most established and reliable prognostic factor for subsequent metastatic disease and survival (Fisher et al, 1993).

The microarray analysis that included the cohorts from two prospective preoperative studies yielded preliminary data showing that an 8-gene signature predicted pCR and a 17-gene signature predicted DFS. With the 17-gene signature we were able to discriminate low- and high-risk patients with a high hazard ratio of 70, showing a higher discriminating value than single clinical variables (e.g., histologic grade, HR status, HER2 status). Many studies have aimed to identify gene profiles predicting response and prognosis, including commercially available panels (van't Veer et al, 2002; Paik et al, 2004; Paik et al, 2006; Nishio et al, 2014). Most are based on node-negative early breast cancers, and include only HR− or HR+ cancers. This study combined all subtypes, and identified two sets of gene clusters that, respectively, predicted pCR and PFS. By including all subtypes in the analysis, we tried to seek for a molecular feature that explains the difference in chemosensitivity and prognosis beyond the known biomarkers. There is more hidden biology than the already defined subtypes using HER2 and hormone receptor proteins, as the prognosis differs even among the same subtypes(Liedtke et al, 2008). Also, discordance of subtype between primary and metastatic sites have been reported (Falck et al, 2013; Yao et al, 2014), which also implies that the cancer is driven beyond the known biomarkers. Most molecular analysis in the literature are carried out in breast cancers of a certain HER2 or HR status and therefore our analysis is novel in that the identified signatures may be useful regardless of subtype and those with discordances or changes of subtype at metastatic sites or at relapse.

The 8-gene signature may identify which patients will benefit most from preoperative chemotherapy and which should initially undergo surgery. With the DFS prediction signature, we may be able to predict those who will benefit from additional adjuvant chemotherapy or extended hormone therapy.

The genes that were chosen in our two signatures did not show an overlap with genes in other microarray studies, namely those assessed with the MammaPrint (Agendia, Irvine, CA, USA) and Oncotype DX (Genomic Health, Redwood City, CA, USA) panels. Possible explanations are that our study included all subtypes regardless of node status, as well as patients who were treated or not treated with trastuzumab. Nonetheless, we identified a gene signature with a high prognostic power, assuming that molecular pathways other than the already known molecular subtypes are associated with tumour recurrence. Further, in our study the gene signature predicting pCR and DFS did not identify any overlapping genes between the two signatures. This result is supported by our clinicopathological data sets, where a discordance between pCR and DFS also occurred among subtypes. The HR−/HER2− subtype was associated with a significantly poor DFS compared with other subtypes, despite a relatively high pCR rate, which was also shown by Liedtke et al (2008). The NeoALTTO trial showed that dual anti-HER2 inhibition significantly increased pCR rate compared with trastuzumab alone, but at the same time did not show a survival difference between treatment groups(de Azambuja et al, 2014). A recent meta-analysis concluded that the pCR rate should not be used as a surrogate marker for survival (Cortazar et al, 2014). Therefore, response to chemotherapy and prognosis outcome may be a result from different biology. Using the two different signatures for analysis of tumour response to preoperative chemotherapy and its prognosis may provide a new means of making individual therapeutic decisions.

Further, discordance of genomic signatures among studies is a well-discussed topic, and it is known that independent signatures may be similar in terms of outcome predictions despite a lack of gene overlap. Our Supplementary Data analysis in the HER2− subtype, resulting in a 3-gene signature, also did not show an overlap with the 8- or 17-gene signature. One explanation is that the number of samples used were different (49 vs 78). Also, two different analysis methods were used. The results of the Supplementary Data used a svm with linear kernel, which mainly focuses on the error rate, rather than the odds ratio.

Our 8-gene signature predicting pCR included genes associated with tumourigenesis such as ADAM metallopeptidase domain 17 (ADAM17), glutaminase-2 (GLS2), and HOXA6. The ADAM17 protein catalyses the release and activation of ligands such as transforming growth factor-α, which is essential in activating epidermal growth factor receptor (Blobel, 2005), and high expression of ADAM17 was associated with shorter survival in breast cancers (McGowan et al, 2008). Recently, in vitro data suggested that an antibody against ADAM17 had antitumour effects in TNBC cells (Caiazza et al, 2015). Glutaminase is involved in the Warburg effect in cancer cells, and two human glutaminase genes, GLS1 and GLS2, have been identified (Erickson and Cerione, 2010). A discovery of an alkyl benzoquinones that preferentially inhibit GLS2 and subsequently reduces carcinoma cell proliferation and induced autophagy via AMPK-mediated mTORC1 inhibition has been reported (Lee et al, 2014). It suggested that the inhibition of GLS2 may be a potential anticancer target. HOX genes encode a highly conserved family of homeodomain-containing transcription factors that have crucial roles in determining the identity of cells and tissues during embryogenesis. Aberrant HOX gene expression has been linked to a variety of adult malignancies (Shah and Sukumar, 2010). In a study to analyse the expression of 39 HOX genes in malignant breast tissues, HOXA6 showed low expression levels along with HOXB8 and HOXC5 in malignant tissues, whereas the other HOX family genes were expressed at higher levels (Hur et al, 2014). Our 17-gene signature that predicted DFS included CASP8, an apoptosis-related cysteine peptidase encoding gene. Caspase-8 has a central role in the transmission of the death signal in the death receptor (extrinsic) pathway of apoptosis by coupling the stimulation of death receptors to the activation of intracellular signalling cascades that eventually lead to cell death (Barnhart et al, 2003). It is frequently inactivated in tumours of breast, colon, or lung (Shivapurkar et al, 2002). Therapeutic agents such as interferon-γ and peptides induce caspase pathways, and microtubule-stabilising agents such as taxanes promote CASP8-mediated apoptosis, possibly via upregulation of components of the tumour necrosis factor-related apoptosis-inducing ligand pathway (Nimmanapalli et al, 2001; Muhlethaler-Mottet et al, 2004) or by amplification of CASP8 activation via microtubule-anchored death effector domain ‘filaments’ (Mielgo et al, 2009). Based on these data, we assume that patients classified into the high-risk group by our 17-gene signature may benefit from more intensive chemotherapy, and those in the low-risk group may be able to avoid extra chemotherapy.

As for the Supplementary Data, analysis using the training and validation sets identified HER2− status as well as three genes, namely PML, TrpRS, and CHKA, as the most accurate markers for predicting pCR. PML is a tumour suppressor gene as its expression is lost in several types of human cancers and is associated with tumour grade and progression (Gurrieri et al, 2004). TrpRS has been documented to function in proangiogenic responses (Wakasugi and Schimmel, 1999; Mirando et al, 2014). Human TrpRS exists in two forms: full-length protein and truncated TrpRS. The expression of mini TrpRS is stimulated by the antitumorigenic IGN-γ (Rubin et al, 1991), and in a protein signature analysis with TNBC, TrpRS was identified as a good prognostic marker (Campone et al, 2015). CHKA catalyses the phosphorylation of choline and has been shown to be upregulated in many cancer types, including breast, lung, colorectral, and prostate cancer(Katz-Brull et al, 2002). When the gene is amplified, it modulates ER-driven proliferation and ER/oestrogen response element transactivation (Lopez-Knowles et al, 2015). With this three-gene model, validation set was used for performance evaluation, which resulted in a 93% accuracy rate on validation. The clustering analysis revealed that the HER2+ cancers correlated with only the B group gene probes, whereas the HER2− cancers showed a correlation with the N, J, I, and H groups. According to a pooled analysis of studies of gene modules and response to preoperative chemotherapy (Ignatiadis et al, 2012), high module scores for chromosomal instability, phosphatase and tensin homolog loss, and E2F3 transcription factor were associated with increased pCR probability in HER2− tumours. These findings, along with our current results, suggest that HER2− cancers possess marked heterogeneity, with many different processes and pathways associated with tumour growth or sensitivity to chemotherapy. This is in contrast to HER2+ cancers, in which HER2 amplification-driven oncogenesis has a much more significant role than other pathways. Therefore, we believe that using the proper sets of genes, pCR could be more successfully predicted in HER2− cancers compared with those that are HER2+, would be most successful in predicting pCR with the proper sets of genes.

Some limitations of our analysis must be mentioned. First, the relatively small sample size may have increased the false discovery-positive rate. Second, all patients in the analysis received anthracycline plus taxane-based preoperative chemotherapy regimens. It is therefore not known if the associations between significant genes and pCR or DFS are anthracycline-specific or taxane-specific, or instead indicate general chemosensitivity. Third, validation of the gene signatures was not performed owing to the lack of events, which will be conducted in the future. Fourth, our analysis integrated all subtypes, which is also the novelty of our analysis, but identifying a predictive gene signature for each subtype may have been more suitable for clinical use. However, if analysis was carried out for each subtype, the number of cases would be limited with a decrease in the power of the test, leading to a less meaningful gene signature. We believe that enrolling patients in a prospective trial, as performed in our study, should decrease the number of accidental findings related to associated genes. Furthermore, we performed an integrated analysis of the two trials, using them as a training set and validation set, respectively.

Our clinicopathological and gene expression analyses for prediction of response to chemotherapy and recurrence resulted in two preliminary set of genes predictive of pCR and DFS, which may provide guidance regarding individual therapeutic decisions. RNA genomic analysis was feasible in 73% of the specimens, resulting in successful molecular classification, suggesting that HER2− tumours have a higher likelihood of gene clustering. We wish to further expand and validate this analysis by using integrated data from larger retrospective populations from previously published studies (Madden et al, 2013) or global prospective trials.