Background

Gene expression tests are new tools to clinically determine the risk of relapse in early-stage breast cancer (BC) [1, 2]. The 21-gene recurrence score (Oncotype DX) and Mammaprint (NKI-70) [310] have been shown to impact treatment decisions [1113]. Novel prognostic tests, such as EndoPredict and PAM50, are also able to predict early, as well as late metastases [14, 15].

EndoPredict is a standardized test for the molecular pathology laboratory and was the first multigene test used in a decentralized setting [16, 17]. It was established and validated in two independent clinical validation studies (ABCSG6 and ABCSG8) involving patients with ER+/HER2− BC treated with adjuvant endocrine therapy (ET) only [18]. EndoPredict provides prognostic information beyond all common clinicopathological parameters [18] and clinical guidelines [19]. The molecular information (EP score) is further combined with tumor size and nodal status resulting in the EPclin score.

The PAM50 assay is an optimized gene set used to identify intrinsic subtypes and predict the Risk Of Recurrence (ROR) at 10 years [20, 21]. The ROR score was developed in a microarray-based cohort of node-negative, untreated BC patients [6, 20]. Four versions of ROR exist in the research setting: ROR based on subtype information (ROR-S), ROR-S with proliferation (ROR-P), ROR-S with tumor size (ROR-T), and ROR-P with tumor size (ROR-PT) [20, 21]. The minimum ROR score of all Luminal B scores was assigned as the low-risk threshold for each model and the maximum ROR score of all Luminal A scores as the high-risk threshold [21].

The inclusion of established clinicopathological risk factors in ROR and EP scores, such as tumor size (ROR-T, ROR-PT, and EPclin) and nodal status (EPclin), increases their predictive performance [1821]. EndoPredict and research-based PAM50 were evaluated independently in the GEICAM/9906 trial [2225]. We compared the prognostic performance of the EP test with the research-based, non-standardized PAM50 assay in node-positive, ER+/HER2− BC patients treated with adjuvant chemotherapy followed by ET.

Patients and methods

Patients and tumor samples

Patients in this study participated in the GEICAM/9906 trial, a randomized phase III trial that compared adjuvant chemotherapy regimen of 5-fluorouracil, epirubicin, and cyclophosphamide (FEC) with FEC followed by weekly paclitaxel (FEC-P), and then followed by 5-year hormonal therapy (tamoxifen, aromatase inhibitors or both) in 1246 women with lymph node-positive disease [26]. This trial was performed in accordance with the Declaration of Helsinki, approved by the ethics committees at all participating institutions (see Supplementary Table 1S) and the Spanish Health Authority, and registered at www.clinicaltrials.gov (NCT00129922). Patients provided their written informed consent for therapy randomization and molecular analyses. Patients whose tumors were ER+/HER2− according to a central review by qRT-PCR and consented to genomic analysis were eligible.

Formalin-fixed paraffin-embedded (FFPE) tumor blocks used to compare PAM50 and EP scores were collected at the time of surgery.

EndoPredict gene expression analysis

RNA extraction and gene expression analysis for identifying the ER+/HER2− subgroup and performing the EndoPredict have been recently described. Briefly, total RNA was extracted from one 5-µm whole FFPE tissue section using a silica bead-based, fully automated isolation method (Tissue Preparation System, VERSANT Tissue Preparation Reagents, Siemens Healthcare Diagnostics) [27].

To identify patients with ER+/HER2− tumors, ESR1 and ERBB2 gene expression levels were analyzed and predefined cut-off levels were applied as recently described. The EP score is based on eight cancer-related genes (BIRC5, UBE2C, DHCR7, RBBP8, IL6ST, AZGP1, MGP, STC2) and three reference genes (CALM2, OAZ1, RPL37A), and measured by qRT-PCR [18].

PAM50-ROR gene expression analysis

RNA was extracted from two 1-mm FFPE cores as previously described [20]. To determine the research-based versions of the ROR scores and groups, normalized gene expression data obtained from the qRT-PCR platform were gene-median-centered, and the microarray-based PAM50 intrinsic subtype predictor was applied as previously described [20]. Of note, the microarray-based training dataset, from which survival coefficients were derived, is based on patients with node-negative disease that did not receive adjuvant systemic therapy [6]. In addition, ROR thresholds (low and high) consist of subtype distributions along the ROR scores in the training dataset [21].

Statistical analysis

The ROR and EP scores were calculated blinded to clinical data and sent to the GEICAM study group in Madrid for independent statistical analysis. Only the GEICAM group had access to the combined clinical outcome and gene expression data.

The primary endpoint was distant metastasis-free survival (MFS) estimated using the Kaplan–Meier method. Pearson correlations compared gene signatures. Two-sided log-rank tests were used to compare subgroups. P values <5 % were considered statistically significant. Each gene signature was added to a model containing common clinical parameters and one other evaluated signature. C-indices were then calculated for the clinical variables and model combinations to estimate the performance of each variable for predicting distant metastasis. Differences were evaluated using the log-likelihood ratio statistic test (for proportional Cox model hazard rates) and the comparison of c-index with resampling (both one-sided tests).

The PAM50 ROR-S, ROR-P, and ROR-T and ROR-PT scores classified patients as low-, intermediate-, and high-risk, using the following pre-defined cut-off values, respectively: ROR-S (<24; 24–53; >53), ROR-P (<12; 12–53; >53), ROR-T (<29; 29–65; >65), PAM50 ROR-PT (<18; 18–65; >65). Both, EP and EPclin categorized patients into low- (EP score <5; EPclin score <3.3) and high-risk groups (EP score ≥5; EPclin score ≥3.3) [18]. The following clinical parameters were used for the analysis: positive nodal status (1–3; 4–10; >10); tumor size [cm (≤1; >1–≤2; >2–≤5; >5)]; Grade (1; 2; 3); age; and treatment arm (FEC; FEC-P).

Results

Patient population

A total of 566 (71 %) of 800 available tumor samples were eligible for evaluation. Tumor samples lacking PAM50 or EndoPredict data were excluded (5 %). Characteristics of the patient cohort included in this study are summarized in Supplementary Table 2S.

Risk categorization

Patients with ER+/HER2− BC were classified as low-risk in 32, 20, and 25 % of cases based on the ROR-S, ROR-P, and EP scores, respectively (Supplementary Table 3S). All gene signatures identified low-risk groups with a significant better outcome compared to the other risk groups (Fig. 1). The 10-year MFS rates for low-risk groups were 87, 89, and 93 %, respectively (Fig. 1). The EPclin low-risk group was smaller (13 %) compared to ROR-T (22 %) and ROR-PT (19 %) low-risk groups, but had a better, though not statistically significant, 10-year MFS rate (100 vs. 88 vs. 92 %, Fig. 1).

Fig. 1
figure 1

Kaplan–Meier curve for metastasis-free survival by EP, ROR-S, ROR-P, EPclin, ROR-T, and ROR-PT risk groups. PAM50 ROR-S, ROR-P, and ROR-T and ROR-PT scores stratify patients (GEICAM/9906, N = 536) in low-risk, intermediate-risk and high-risk. EP and EPclin stratify patients as low-risk for distant recurrence and high-risk groups. Numbers in parentheses indicate the 95 % confidence interval of the hazard ratio. EP EndoPredict score, EPclin EP based on tumor size and nodal status, ROR risk of distant recurrence, ROR-S ROR based on subtype, ROR-P ROR based on subtype and proliferation, ROR-T ROR based on subtype and tumor size, ROR-PT ROR based on subtype, proliferation, and tumor size

Comparing EP versus PAM50 gene signatures

As continuous variables, EP was significantly correlated with ROR-S (r = 0.72) and ROR-P (r = 0.68). Combining the intermediate- and high-risk groups based on ROR-S (ROR-T) and ROR-P (ROR-PT), resulted in a 21 and 20 % discrepancy in patient categorization when comparing EP vs. ROR-S and EP vs. ROR-P classifications, respectively. The MFS of patients with discordant classification were analyzed to compare EP vs. PAM50 risk assignments yielding non-statistical significant differences. However, EP-based low-risk patients had a better outcome than PAM50-based counterparts (Fig. 2). EPclin-based risk classification proved a superior predictor of MFS than the ROR-T score (P = 0.04), but not in comparison to the ROR-PT (P = 0.09) (Fig. 2).

Fig. 2
figure 2

Kaplan–Meier curves for metastasis-free survival by discordant samples between EP and ROR scores. Kaplan–Meier curves by EP–ROR-S, EP–ROR-P, EPclin–ROR-T, and EPclin–ROR-PT. Numbers in parentheses indicate the 95 % confidence interval of the hazard ratio. EP EndoPredict score, EPclin EP based on tumor size and nodal status, ROR risk of distant recurrence, ROR-S ROR based on subtype, ROR-P ROR based on subtype and proliferation, ROR-T ROR based on subtype and tumor size, ROR-PT ROR based on subtype, proliferation, and tumor size

Prognostic performance of predictors

Compared to clinical parameters, ROR-S, ROR-P, and EP molecular signatures had substantially higher c-indices (Fig. 3) and added significant prognostic information beyond clinical parameters based on c-index analysis and resampling (data not shown). C-indices for EP, ROR-S, and ROR-P were 0.657, 0.639, and 0.633, respectively. C-indices for EPclin, ROR-T, and ROR-PT were 0.693, 0.649, and 0.644, respectively (Fig. 3).

Fig. 3
figure 3

Distribution of clinical and molecular parameters c-indices. EP EndoPredict score, EPclin EP based on tumor size and nodal status, ROR risk of distant recurrence, ROR-S ROR based on subtype, ROR-P ROR based on subtype and proliferation, ROR-T ROR based on subtype and tumor size, ROR-PT ROR based on subtype, proliferation, and tumor size

Based on c-indices and resampling, we determined that EP added prognostic information to ROR-P and clinical parameters, but not to ROR-S. C-index was significantly increased by adding EPclin to models containing clinicopathological parameters and ROR-T (P < 0.001), or ROR-PT (P < 0.001). ROR-T and ROR-PT failed to add prognostic information to EPclin (Table 1).

Table 1 Additional prognostic information—GEICAM/9906

Discussion

We compared the prognostic performance of research-based and non-standardized versions of PAM50-ROR scores and EndoPredict in ER+/HER2−, node-positive chemotherapy-treated BC patients from the GEICAM/9906 trial.

ROR-S and ROR-P were significantly correlated with EP score, and gene signatures showed agreement in risk classification, indicating that PAM50-ROR and EP scores identify tumors with similar properties. Despite the significant correlation, the discordance rate of 20–21 % can be explained by the tests’ inherent characteristics. Our c-index analysis indicated that only EP added significant information to ROR-P. None of the other molecular signatures added information to each other, suggesting that additional predictors would not improve prognostic performance. These findings are concordant with our previous combined analysis of hundreds of signatures and clinical-pathological data for prognostic prediction in ER-positive breast cancer where we observed that not much more prognostic power was obtained by including hundreds of signatures into a single model beyond the power contained within a well-developed individual signature when combined with clinical variables [28].

The PAM50-based ROR-T and ROR-PT scores include tumor size, whereas the EPclin score considers nodal status and tumor size, as part of the risk prediction algorithm. Similar to the research-based version, a ROR-PT score weighted for tumor size and proliferation was used to validate the standardized version of PAM50 assay in the ATAC and ABCSG8 trials. In our analysis, all hybrid scores contributed to identifying low-risk groups for distant metastasis, although number of patients and events differed across score categories. The EPclin low-risk group was smaller than the ROR-T and ROR-PT ones and showed no distant-metastatic events. EPclin had been established in a node-positive/node-negative cohort and the predefined cut-off level consequently classified more patients as high-risk in the node-positive GEICAM/9906 trial. In contrast, the research-based versions of ROR-T and ROR-PT scores were derived in a systemically untreated node-negative BC cohort, and thresholds were based on subtype distribution and not actual survival outcomes; therefore, the number of low-risk cases with distant-metastatic events was higher, as reflected by an MFS of 87–92 % in low-risk groups.

Kaplan–Meier analysis of discordant cases, c-index analysis, and log-likelihood tests showed that the EPclin-based risk classification provided independent prognostic information to the ROR-T and ROR-PT scores. The improved performance of the EPclin score over pure molecular scores may be partially explained by the inclusion of nodal status, one of the strongest single prognostic factors, in the EPclin score, but which is not included in any of the other models tested. EndoPredict validation studies demonstrated that molecular EP score, tumor size, and nodal status were the only independent prognostic parameters [18]. Hybrid scores’ superior performance compared to their molecular counterparts supports the recommendation of the Evaluation of Genomic Application in Practice and Prevention working group to integrate clinicopathological factors into gene expression tests [29] rather than relying on pure RNA-based molecular scores.

To the best of our knowledge, our study reports the first direct comparison of EndoPredict and a research-based version of the PAM50 assay. Earlier comparisons of multigene signatures suggested similar prognostic performances [30, 31]. Recently, the transATAC study, the first large phase III study comparing different standardized gene expression-based biomarkers in the same patient cohort [32], compared the standardized and clinically validated version of PAM50 assay, developed under the nCounter system (Nanostring Technologies), with the 21-gene recurrence score. PAM50-ROR provided more prognostic information than the recurrence score [32]. Although our study did not evaluate the standardized and clinically validated PAM50-ROR score, the GEICAM/9906 trial is an additional valuable source for biomarker comparisons. In the context of this trial, we could identify high-risk patients who need additional treatment to the standard anthracycline-based chemotherapy (±taxane) and could be eligible for further treatment with novel drugs, such as CDK4/6 or mTOR inhibitors.

Our results should be interpreted in the context of its limitations. First, ROR scores were generated using research-based and non-standardized versions from qRT-PCR platform. Although research-based PAM50 classification has been evaluated in several clinical trials using qRT-PCR [22, 33, 34], different methods may influence prognostic ability. Of note, large validation studies (ATAC and ABCSG8) for the PAM50 assay were performed using the standardized version with pre-specified cutoffs based on actual survival outcomes (<10, 10–20, and >20 % risk of distant relapse at 10 years) and not subtype distribution [32]. Second, the PAM50 vs. EP comparison was not conducted according to their intended use. Whereas our patients were treated with chemotherapy, these predictors were clinically validated using patients cohorts receiving endocrine therapy alone. Therefore, next steps should compare PAM50 and EP in clinical trials with ER+/HER2− BC patients treated with ET alone. EndoPredict and the standardized PAM50 were recently evaluated in the ABCSG8 trial, which would allow a direct comparison of both clinical predictors [35].

Conclusions

Despite the differences in establishment and the limited overlap in genes, all molecular predictors evaluated showed similar prognostic performance. The addition of clinical parameters, such as tumor size and nodal status, into risk-score determination improves the prognostic ability of these assays.