Introduction

Both adjuvant, and neoadjuvant chemotherapy and hormonal treatment have made a major contribution to improving disease-free survival (DFS) and overall survival (OS) in breast cancer [38, 50, 83, 103]. When physicians prescribe, they consider the risk-to-benefit ratio associated with a given therapy for a specific patient because the therapies have high toxicities. To guide therapeutic decisions, physicians use clinical, histopathological variables and biomarkers as prognostic or predictive tools, these latter being most effective if linked with targeted therapies (companion diagnostics), such as estrogen receptor (ER) and human epidermal growth factor receptor 2 (HER2) [35, 85, 106].

The first aim of this systematic review was to evaluate the level of evidence (LOE) for Ki-67 as a prognostic factor or predictive factor of response to chemo- and hormonotherapy in patients with invasive breast carcinoma, and define its weight in the everyday therapeutic decision-making process, in particular within the ER+ tumour group in order to select women who are most likely to benefit from chemotherapy. The second aim was focused on technical and methodological aspects about the measurement of Ki-67, and on the cut-points used for treatment decision.

We report data from studies using samples from randomized clinical trials (RCTs), cohort studies, case–control studies, and we also summarize the results of systematic and narrative reviews. We paid particular attention to the methodological aspects of the studies. We took into account the recommendations published in 2008 by the National Academy of Clinical Biochemistry and elaborated by an international panel of experts, which agreed with those proposed by the ASCO guidelines and complement them, in particular by their analysis of data related to the quality of the analytical procedures used [51, 101]. In addition, the Reporting Recommendations for Tumor Marker Prognostic Studies (REMARK) score was used to assess the quality of the reporting of the prognostic study results [71].

Recently it has been claimed that results from commercially available genomic profiling tests (i.e. Mammaprint™, Oncotype DX®) can predict which patients should receive therapy. Several genes coding for proliferation factors, a key biological driver, are targeted in these genomic profiling tests [12, 26, 58, 112, 113]. Moreover, recent tests, e.g. MapQuant Dx™ genomic grade and THEROS BCI® have been developed to assess tumour grade molecularly since proliferation is a major component of tumour grade [113]. Nevertheless, it remains uncertain if these available genomic profiling tests have significant added value when compared with the histopathological assessment of ER, HER2 and Ki-67, the latter being routinely used as a marker of proliferation, although not yet as a standard, in breast cancer [25, 113].

Ki-67 protein is detected during all the active phases of the cell cycle, but is absent in resting cells [65]. Since its discovery in the early 1980s, there has been interest in the role of Ki-67 as a proliferation marker in cancer, particularly lymphomas, breast, endocrine and brain cancers. It is commonly used as a complement to grading systems that include mitotic counting as a sign of proliferation. Initially, immunohistochemical detection was performed on frozen tissues as the available antibodies had lower affinity on fixed tissues. Antibodies that are currently available can provide sufficiently intense immunostaining on paraffin sections, making the test more feasible [61]. Interestingly, Ki-67 is one of the five genes of proliferation that contributes an importance weight to the Oncotype score, out of 16 cancer-associated genes [79].

Methods

We searched PubMed to identify prospective or retrospective studies reporting results from analyses of Ki-67 as either a prognostic factor or a predictive factor in women with breast cancer. The terms used for searching were divided into three groups to identify references on: breast cancer and its treatment, Ki-67 and the types of studies (Appendix 1). These were combined into a search strategy that was limited to publications in English from 1 January 1990 to 31 July 2010. The titles and abstracts of the references identified by this search strategy were screened by two methodologists independently. The reference lists of included studies were scanned to identify additional references.

The outcomes of interest for the prognostic studies that had to be present for inclusion of the study were OS or DFS. In the predictive studies, the outcomes of interest were clinical or pathological partial or complete response.

Data extracted

The items that were extracted from each publication are listed in Appendix 2. The working group validated and agreed on the interpretation of the data and assigned the LOE using the recently revised definition (Table 1) [98]. The REMARK 20-item guideline was used to assess the quality of the reporting in the prognostic studies identified, only for RCTs. Items included were: the description of patients; specimen characteristics; assay methods; study design; statistical analysis methods; presentation of results and study objectives and pre-specified hypotheses [71].

Table 1 Summary of definitions of LOE [98]

Results

After screening the 314 references identified by the search strategy, 71 were included in this review (Fig. 1). The main reason for exclusion was the type of breast cancer (ductal carcinoma in situ). Details from the studies, including tumour characteristics, treatment regimen, Ki-67 analysis modalities are reported in Table 2.

Fig. 1
figure 1

Summary from the literature search

Table 2 Summary of studies assessing Ki-67 in samples from randomised controlled trials

Samples from randomised clinical trials

We identified 17 studies that analysed samples from patients that had been included in RCTs in neoadjuvant and adjuvant setting [5, 6, 11, 13, 20, 29, 3235, 37, 39, 40, 46, 60, 66, 69, 70, 72, 77, 83, 91, 99, 108110]. Ki-67 was assessed as a prognostic factor for 9,185 patients in ten studies (three with neoadjuvant treatment and seven with adjuvant treatment), both as a prognostic and predictive factors in three studies involving 411 patients (all with neoadjuvant treatment) and as a predictive factor in four studies involving 520 patients (all with neoadjuvant treatment).

In the majority of studies with Ki-67 as a prognostic factor, both node negative (pN0) and node-positive (pN+) patients were included. In the univariate analyses the hazard ratio (HR) for DFS ranged from 1.06 to 2.09. Ki-67 remained an independent prognostic factor in multivariate analyses in seven studies (HR 1.05–1.72). Despite the differences in the methodologies used, particularly the cut-point for Ki-67, the HR values were consistent.

Only five studies used OS as a primary objective to evaluate Ki-67, and one analysed breast cancer-specific survival (BCSS). In the studies with OS, Ki-67 was a statistically significant prognostic factor (HR 1.11–1.83) in univariate analyses; this was not reported for the trial with BCSS as the outcome. Multivariate analyses were reported for four trials and Ki-67 was an independent prognostic factor in only one trial with OS; it was also significant in the study with BCSS.

The REMARK score for these studies ranged from 9 to 18 (on a scale from 0 to 20), with a median of 12 and a mean of 12.8. The LOE for Ki-67 as a prognostic factor for DFS (Table 1) was judged to be I-B since the results were consistent across several studies, done using material from randomized trials and with centralized slide review.

Among the seven studies that evaluated Ki-67 as a predictive factor, either solely or also as prognostic factor, three studies assessed the response to neoadjuvant chemotherapy [20, 60, 66, 70], one assessed neoadjuvant hormonotherapy [3235, 99], and three assessed neoadjuvant chemotherapy and hormonotherapy [13, 46, 110]. Only one study [46] concluded that elevated Ki-67 was predictive of response to chemotherapy; therefore, the LOE for Ki-67 as a predictive factor was judged to be IIB.

The trials assessing the predictive value of Ki-67 in and adjuvant setting evaluated either first generation adjuvant chemotherapy versus no treatment, or compared an optimal versus sub-optimal regimen. In the IBCSG VIII/IX trial, Viale et al. [109] did not detect any predictive value for Ki-67 for the efficacy of CMF compared with no chemotherapy. In this analysis, the P values for Ki-67 treatment interaction were 0.45 and 0.90 for IBCSG VIII and IX, respectively. In two other randomized trials comparing anthracyclines versus non anthracycline-based chemotherapy (NEAT/BR9601), no interaction between Ki-67 and the treatment arms was detected, suggesting that the treatment efficacy was not predicted by the Ki-67 level [6]. Finally, at least two studies assessed the predictive value of Ki-67 for the efficacy of docetaxel. Penault-Llorca et al. [83], using material from the PACS01 trial, reported that high Ki-67 was associated with a higher efficacy of docetaxel. However, these results are insufficient to conclude that Ki-67 is a predictive factor.

In a study published after the literature search for this report, Dumontet et al. [36] analysed tissue specimens for prognostic and predictive factors in the BCIRG 001 trial. They concluded that Ki-67 was an independent prognostic factor in women receiving adjuvant chemotherapy for node-positive breast cancer, but was not a predictive factor for response to docetaxel. Overall, these studies suggest that Ki-67 is not predictive for chemotherapy.

Samples from cohort and case–control studies

We identified 47 cohort studies that assessed the role of Ki-67, as a prognostic factor solely (32 studies; 16,902 patient; patients received neoadjuvant treatment in one study, adjuvant treatment in 25 studies and no details of treatment were available in six studies), as a predictive factor solely (eight studies; 655 patients; patients received neoadjuvant treatment in six, adjuvant in one, and both in one study), or both as a prognostic and predictive factor (seven studies; 1,844 patients; all patients received neoadjuvant treatment) [24, 79, 1419, 2123, 30, 4245, 49, 5457, 59, 62, 63, 67, 73, 76, 78, 8082, 84, 8690, 92, 93, 96, 97, 104, 105]. We also identified one case–control study (828 patients) in which Ki-67 was assessed as a predictive factor for chemotherapy [1].

About 2/3 of these studies assessing Ki-67 as a prognostic factor (n = 39) reported that it was an independent factor for DFS or OS or both. Of the 15 studies assessing Ki-67 as a predictive factor, seven suggested that it may be a predictive factor for response to treatment. Most studies reported anthracycline regimen or CMF as chemotherapy and tamoxifen or letrozole or goserelin as hormone therapy. The unique case–control study considered that a high Ki-67 value (71–100%) was independently predictive of benefit from adjuvant chemotherapy treatment [HR for BCSS = 0.35 (95% CI = 0.18–0.69), P = 0.003].

Meta-analyses

Although the two meta-analyses were published within a year of each other, they did not include the same studies (with 57 and 60% overlapping, respectively); the statistical methods used were also different (Tables 3, 4; Fig. 2) [27, 100]. Neither of these meta-analyses differentiated if the tissue samples came from randomised controlled trials or case–control or cohort studies.

Table 3 Comparison of the methods used in the meta-analyses published by de Azambuja et al. [27] and Stuart-Harris et al. [100]
Table 4 Description of meta-analyses of studies of Ki-67 as a prognostic factor
Fig. 2
figure 2

Repartition of the studies included in the meta-analyses published by de Azambuja et al. [27] and Stuart-Harris et al. [100]. The numbers of studies common to both meta-analyses are shown in the overlapping circles and those unique to either one of the meta-analyses are shown in the non-overlapping parts of the circles. DFS Disease-free survival; OS overall survival

In the meta-analysis published by de Azambuja et al. in 2007 [27], the prognostic value of Ki-67 was reported only in univariate analyses for both DFS and OS. In the analysis for DFS, they collected data from 38 studies (including 10,954 patients) and found a HR of 1.88 (1.75–2.02) with a fixed effect model, but with significant between-study heterogeneity (design, type of patients and results). In the analysis for OS concerning 35 studies (including 9,472 patients) they found a HR of 1.89 (1.74–2.06), also with a fixed effect model and significant between-study heterogeneity. In sub-analyses, similar results were observed, but no heterogeneity was found for pN+ patients or for untreated patients (pN0 for DFS and pN0/pN+ for OS).

In the meta-analysis of Stuart-Harris et al. in 2008 [100], after adjustment for probable publication bias, a high level of Ki-67 was associated with poor DFS and OS and this remained statistically significant in multivariate analyses. The pooled adjusted HRs were 2.05 (1.80–2.33) and 1.88 (1.55–2.27) for DFS and OS in univariate analyses, and 1.76 (1.56–1.98) and 1.42 (1.14–1.77) in multivariate analyses, respectively. In the analyses for DFS, there were no evidence of significant between-study heterogeneity, but this was not the case for OS.

The authors in both these meta-analyses acknowledged that the included studies used different eligibility criteria, study design, methods for measuring Ki-67 and cut-point values. Despite the differences, the results are consistent, and thus reinforce the value of Ki-67 as a prognostic factor.

Narrative reviews

Four narrative reviews were identified [24, 107, 111, 115]. None of these reviews assessed the predictive value of Ki-67. Two of them, Weigel and Dowsett [111] and Yerushalmi et al. [115] summarized the results from the meta-analyses described above. Colozza et al. [24] who looked at several markers included 15 studies (5,137 patients) for Ki-67. They concluded that Ki-67 was a statistically significant prognostic factor but not a standard one at present, due to the lack of standardization for pre-analytical steps, staining procedures and scoring methods. Urruticoechea et al. [107] reviewed 40 trials, involving more than 11,000 patients). They found strong evidence that Ki-67 was a prognostic factor for pN0 patients in univariate analyses, and that it remained significant in multivariate analyses. In the studies with pN+ patients or mixed pN0/pN+ patients, the results were less clear, although one study concluded that Ki-67 was a candidate biomarker for predicting docetaxel efficacy in ER+, pN+ breast cancer [83].

Discussion

Early detection and improvements in systemic neoadjuvant and adjuvant therapies explain the observed decrease in mortality in breast cancer [53]. However, since chemotherapy is associated with adverse effects, it is important to be able to tailor treatment strategies for each patient. Companion diagnostic tests, such as HER2 or ER measurements, which are by essence predictive, are already key actors in daily therapeutic strategies. In parallel, non-associated tests, such as proliferation biomarkers, continue to be investigated in the hope of finding reliable tools to help to identify those women who are most likely to benefit from chemotherapy.

Ki-67 was significantly associated with DFS in multivariate analyses in seven RCTs and two meta-analyses with consistent HRs or relative risks (RRs) [27, 29, 3235, 39, 40, 83, 99, 100, 108, 109]. Although, more heterogeneous, similar results were reported in studies using samples from cohort studies. The HRs and RRs reported for Ki-67 in most of these studies were within the same ranges as those found for other validated prognostic markers (ER, HER2, uPA, node status, histological grade) (Table 5) [10, 64, 94, 100, 113].

Table 5 Summary of assessment of various markers as prognostic factors for DFS in women with breast cancer

The evidence reviewed here, with consistent results between the studies allowing the attribution of an LOE I-B, validates the use of Ki-67 as a prognostic factor for DFS in patients receiving adjuvant therapy. As none of the studies were specifically designed to assess Ki-67 as a prognostic factor, the LOE cannot be I-A. A LOE I defines a marker that is ready for clinical use, therefore, justifying its status as a biomarker as suggested by Diamandis [31]. This level is based on the hierarchical classification for medical utility of a biomarker proposed by Simon et al. in 2009 [98], an updated revision of the initial classification proposed by Hayes et al. in 1996 [52]. This differs dramatically from the LOE proposed by Colozza et al. [24] who suggested a level III or even IV. However, it should be emphasised that our conclusion is based on results from studies using samples from RCTs with central review of the marker; that was not the case in the review conducted by Colozza et al. that included studies published before 2004. This implies that standardization of the techniques and counting methods ensuring efficient and practical alternatives to centralized testing (i.e. automated staining and image analysis) will be necessary for everyday practice.

The results from the studies using samples from patients included in RCTs do not provide sufficient proof to conclude that Ki-67 is a predictive factor for short-term or long-term response to chemotherapy, since the study designs were not suitable for answering this question. The LOE is therefore II-B and a higher LOE will only be possible if suitably designed prospective studies are conducted. Nevertheless, an association between high Ki-67 expression at baseline and immediate response to hormonotherapy or chemotherapy in the neoadjuvant setting was reported in seven case series [14, 15, 73, 78, 82, 84, 90], two of them with pathological complete response (pCR) [73, 78]. The studies in the neoadjuvant chemotherapy setting analysed pCR as the endpoint. In contrast, the studies in the neoadjuvant hormonotherapy setting, used a clinical response endpoint. In breast cancer samples from women with incomplete pathological response after neoadjuvant therapy, the Ki-67 expression in the residual tumour was reported to be prognostic, irrespective of the original pre-treatment value [40, 59, 102].

Prognostic variables are needed in clinical practice. Histological grade can clearly distinguish between low and high risks tumours (grade 1 vs. grade 3) in terms of outcomes. However, about 40–50% of breast cancers are classified as grade 2 with a less well-defined risk. The histological grading system is constructed from a parameter of differentiation (glandular formation), nuclear appearance and a clear proliferation parameter (mitotic count). This explains why grade and Ki-67 index are closely linked, and why the grade is not always integrated in the multivariated models used for assessing Ki-67. The fact that such a link exists does not mean that the parameters are redundant and the use of Ki-67 index in a grade 2 population could be particularly useful to sub-classify them [2]. Patients with ER+ tumours are systematically treated by hormonotherapy today in the absence of contra-indications. It is possible that a Ki-67 assessment prior to deciding to propose additional adjuvant chemotherapy might be useful for a subset of ER+ patients with grade 2 tumours.

The choice of the cut-point has a major impact in practice, as it determines which patients are classified as ‘high Ki-67’, and therefore which have a poorer prognosis. These patients will generally receive more aggressive therapy. In the published studies reviewed, many different ways to select a cut-point were used, defining two or three subgroups. These include an arbitrary choice based either on the different cut-points proposed in the literature or the use of the “significant” mean value from an ‘in house’ series. In our review of studies using samples from RTCs, most arbitrary cut-points for adjuvant treatment choice were distributed between 5 and 34% with 10 or 20% being the most frequently used values (Table 2).

The use of data-derived ‘optimal’ cut-points can result in serious bias due to different patient populations in each series. It should be stressed that transforming continuous variables, such as the Ki-67 index, into two categories can lead to a loss of power of the biomarker [88, 95, 108, 109]. In addition, this is unrealistic at the individual level, since it suggests that patients, who have tumours with Ki-67 levels close to the cut-point but on either side of the cut-point, are very different, whereas in reality they are probably very similar. Technically it is not necessary for statistical analysis to have a binary variable, and it has been show that a model with continuous values provides more information [95]. In clinical practice, one way of expressing the results is to use two cut-point values which define a central ‘grey’ area between the low and high values. For patients whose Ki-67 level falls in this grey area, other factors could be considered in the decision to offer chemotherapy or not. This is the approach adopted by the St Gallen International Expert Consensus who recommended the use of Ki-67 to measure proliferation [47, 48, 95]; women with ER+ tumours and ‘high’ Ki-67 (i.e. >30%) should receive chemo-hormonotherapy, those with ‘low’ Ki-67 (≤15%) (luminal A tumours) should receive hormonotherapy alone and the ‘intermediate’ level (16–30%) is not decisive for therapeutic decision.

Ki-67 expression is detected by immunohistochemical techniques on histological slides. Molecular testing using RT-QPCR on fixed-paraffin embedded tissue samples is also feasible [28] but not used in practice. Both techniques give quantitative results but the qualitative aspect of tumour heterogeneity is only accessible on histology slides. Comparative studies are in progress but the results are not yet available. Moreover, both techniques, as for all biomarkers, need standardized pre-analytical conditions which require cooperation between radiologists, surgeons, and pathologists. Most laboratories use MIB-I or SP6 antibodies for immunohistochemistry that provide highly comparable results, although SP6 appears to be better suited for image analysis [116]. However, the methods of antigen retrieval from paraffin-embedded samples, the concentrations of antibodies, the time of incubation, as well as the amplification reagents vary and may significantly influence the final results [114, 116]. We observed this variation in the studies analysed in this review (Table 2). Automatic immunostaining was reported to be used in only three of the published studies, despite the fact that most laboratories are nowadays equipped with such systems [88, 108, 109]. Also, the way samples are treated immediately after collection and the way they are stored may affect the final results, but generally only sparse information on this was provided in most of studies reported. In general, all studies reported using a negative control. However, there was no standardized positive control for staining calibration. Some studies used tonsil tissue, while others used known highly positive breast cancer tissue. The intensity of nuclear staining that was considered to be positive also varied; in some cases any staining was taken as positive, whereas in others positivity required ‘marked’ staining. Some studies reported using ‘hot spots’ (or areas of intense staining) for the assessment, whereas others used fields with different intensity of staining giving the result as a mean value. Significant variation in the number of fields examined, the number of tumour cells counted or estimated, the use of a graticule for counting or the use of automated counting systems was also seen. Some studies reported a double reading of all slides, or of a certain percentage of slides.

Due to limits in histological quantitative analysis and tumour heterogeneity, leading to inter/intra-observer variations on grade scoring, some grade 2 tumours are mis-classified as grade 1, and also some grade 1 tumours are mis-classified as grade 2. The assessment of Ki-67 levels in these borderline cases provides additional information to clinicians. Similar overlap exists between grade 2 and grade 3 tumours, but without a significant impact on therapeutic decision. In view of the inaccuracies expected in the Ki-67 index values, partly due to the heterogeneity of the techniques as discussed above, and partly due to tumour heterogeneity, it may be useful to generalize automated quantitative image analysis, to report both ‘hot spots’ and mean Ki-67 values, and to expand the 16–30% intermediate level of St Gallen to 11–30% [47]. This wider intermediate level would ensure a better identification of tumours with low and high levels of Ki-67. The risk of making an error when assessing a Ki-67 score <10 or >30% will be low in routine practice, but is to be expected for the intermediate level between 11 and 30%, requiring, therefore, double assessment or automated image analysis.

Reporting key details are essential to assess the reliability of the study results. Initiatives such as the CONSORT guidelines have been shown to improve the quality of reporting for RCTs [74, 75]. In a similar way, the REMARK guidelines were developed to improve reporting of prognostic studies and their results [71]. Mallett et al. [68] reported in 2010 the results from an analysis of reports of prognostic tumour marker studies published in 2006 and 2007 using the REMARK score. The aim of their study was to assess if the publication of the guidelines had had an immediate impact on the quality of the reported studies. Although most of the studies reported the number of patients in the analyses (98%), only just over half reported the number of eligible patients (56%) and excluded patients (54%). Only 36% of the reports clearly defined the outcomes analysed. The authors concluded that although good reporting is essential for the interpretation and clinical application of prognostic studies, the standards of reporting in 2006 and 2007 were poor. They called for a wider use of the REMARK guidelines to help improve reporting and enhance prognostic research. The results of our review show that articles published prior to the publication of REMARK in 2005 had a lower range of REMARK scores (n = 9; 6–13) than those published after (n = 9; 10–18) which suggests that the quality of reporting has improved.

Conclusions

The results from this review show that Ki-67 provides useful information for therapeutic decisions in breast cancer patients. It is an independent prognostic factor for DFS and the greatest benefits from Ki-67 assessment could be observed in patients with ER+ breast cancers. It is not predictive for chemotherapy, but high KI-67 was found to be associated with immediate pCR in the neoadjuvant setting.

In view of these results, international guidelines should help to standardize the pre-analytical phase, the staining techniques and the counting methods. We also need to standardize the cut-point determination to ensure that Ki-67 results can be used with confidence in clinical practice.