Introduction

Breast cancer accounts for more than 20% of all newly diagnosed female cancers [1]. Mortality rates have steadily been reduced because of new treatment regimens and early detection. At the time of initial diagnosis about 7% of the patients present with advanced disease [2]. Patients diagnosed with locally advanced primary breast cancer are considered to be at increased risk of disseminated disease. These patients now receive neoadjuvant chemotherapy (NAC), even if the tumour is primarily operable.

Complete response is associated with favourable outcome [36]. To obtain the best possible therapeutic response it is crucial to identify responders and non-responders/poor responders at an early stage, thus facilitating tailored treatment for each patient.

Tumour response during NAC has conventionally been assessed using clinical calliper measurement, supplemented with mammography and ultrasound. These techniques have been found to be unsatisfactory, and the accuracy of dynamic contrast-enhanced magnetic resonance imaging (DCE MRI) in evaluating the extent of residual disease has been proved to be superior to other techniques [711]. As a result, MRI is increasingly being used in response evaluation. The Response Evaluation Criteria in Solid Tumours (RECIST) are based on measurement of longest tumour diameter [12]. It has been suggested that volumetric assessment of tumour size may be a more reliable indicator of treatment response than longest diameter, especially in cases of irregular morphology or multifocal disease [13, 14]. The European Society of Breast Imaging (ESOBI) recommends MRI before NAC, halfway during treatment and after the final course of chemotherapy [15]. There is, however, still no consensus regarding optimal time points for MRI evaluation during treatment or criteria for selecting patients likely to benefit from a change of treatment regimen. Varying cut-off values for diameter and volume reduction to differentiate between poor responders and good responders have been reported [1618]. Currently the ESOBI recommends change of therapy based on MRI only in non-responders and in cases of progressive disease [15].

Dynamic contrast-enhanced MRI visualises functional properties of the tumour in addition to morphological depiction, allowing for detection of changes in enhancement patterns preceding a reduction of tumour size. Diffusion-weighted magnetic resonance imaging (DW MRI) reflects the properties of randomly moving water molecules, providing information on cell membrane integrity and tumour cellularity [19]. Water diffusion is quantitatively assessed by calculation of the apparent diffusion coefficient (ADC). Some studies have shown a negative correlation between pretreatment tumour ADC and treatment response [20], whereas others have found no such correlation [21, 22]. Increase in ADC following cytotoxic treatment supposedly reflects loss of membrane integrity and/or increased extracellular space [23]. In locally advanced breast cancer ADC increase early during NAC has been shown to precede tumour volume reduction [2426] and may be a promising predictor of pCR [18]; the optimal time point for assessment remains to be established.

The purpose of this study was to explore the predictive value of MRI parameters and tumour characteristics before NAC and to compare changes in tumour size and tumour ADC during treatment, between patients who achieved pathological complete response (pCR) and those who did not.

Materials and methods

Patients

Between April 2007 and October 2008 31 patients with pathologically confirmed invasive breast cancer were enrolled in this prospective study. Patient and tumour characteristics are listed in Table 1.

Table 1 Patient and tumour characteristics

Study design

MRI examinations, x-ray mammography and ultrasound were carried out at three time points; before NAC (Tp0), after the fourth cycle of NAC (Tp1) and before surgery (Tp2). Clinical assessment and imaging were carried out within the same week. Twenty-nine patients commenced four NAC cycles of 5-fluoro-uracil, epirubicin and cyclophosphamide (FEC) (Fig. 1). Following clinical and radiological assessment at Tp1 10 patients were administered two additional cycles of FEC and 18 patients were switched to taxane-based regimens, with the addition of trastuzumab in the cases of HER2 positivity. Two patients had been switched to taxanes and trastuzumab after 3 courses of FEC, because of poor response and side effects. At Tp2 21 patients underwent MRI. This final assessment was performed 1–2 days before surgery.

Fig. 1
figure 1

Study progress-flow chart. NAC, neoadjuvant chemotherapy; FEC, 5-fluoro-uracil, epirubicin and cyclophosphamide; MRI, magnetic resonance imaging; BCS, breast conservation surgery; pCR, pathological complete response; time points Tp0, Tp1 and Tp2. FEC: 5-fluoro-uracil (600 mg/m2), epirubicin (100 or 60 mg/m2) and cyclophosphamide (600 mg/m2) administered every 3 weeks. Taxane-based regimens: 4 cycles of docetaxel (100 mg/m2) administered every 3 weeks or paclitaxel (80 mg/m2) weekly for 12 weeks, with the addition of trastuzumab in the cases of HER2 positivity (initial dose 8 mg/kg, followed by 6 mg/kg every 3 weeks). Number of days from Tp0 to surgery was 167 (range, 128–203). 28 patients underwent mastectomy and axillary lymph node dissection, followed by loco-regional radiotherapy, one patient chose breast-conserving surgery. Owing to metastatic disease 2 patients were not subjected to surgical treatment. * The number of lesions evaluable for pathological response is one more than the number of patients undergoing surgery because of one patient having bilateral disease

MR imaging

MRI was performed using a 1.5T MRI system (ESPREE, Siemens, Erlangen, Germany) and a phased-array bilateral breast coil (CP Breast array coil, Siemens, Erlangen, Germany). The MRI protocol included T1-weighted sagittal images, T2-weighted axial images, 3D axial DCE MRI and DW MRI (Table 2). DW MRI was acquired using b-values of 100, 250 and 800 s/mm2. DW MRI was performed before DCE MRI. DCE MRI was acquired every 85 s, one series before contrast medium administration and five post-contrast series. The contrast agent gadopentetate dimeglumine (Magnevist®, Schering, Berlin, Germany) was administered at a dosage of 0.1 mmol/kg body weight.

Table 2 Magnetic resonance imaging (MRI) protocol and sequence parameters

Radiological assessment and clinical reading

The mammograms were double-read by two radiologists. Ultrasound was performed by a trained radiologist, after having read the mammograms. Tumour was measured in two planes if possible. Maximum diameters were used for comparing different techniques.

The MRI examinations were reviewed in a double-reading by radiologists with 5, 6 and 12 years’ breast MRI experience. Morphological assessment was made of tumour extent, obtaining maximum diameter in the axial plane, according to RECIST [12], and the corresponding orthogonal diameter. Craniocaudal diameter was determined on multiplanar reconstructed (MPR) images in the coronal or sagittal plane. All measurements were performed on subtraction images. Tumour volumes (Vcalc) were calculated using the ellipsoid formula, whereas segmented tumour volumes (Vseg) were obtained using a semi-automated segmentation algorithm based on thresholding of the enhancement curve [21].

At Tp2 six cases were reported as ‘not measurable’ at MRI. The reports were in these cases ambiguous, reporting ‘not measurable because of scattered enhancement’ or ‘not measurable tumour with benign enhancement pattern’. The reporting radiologists reviewed these cases and categorised them as either residual tumour not present (complete response (CR)) or residual tumour present (non-CR). At the time of the second reading the radiologists were blinded to the results of the pathological examination after surgery.

Post-processing of the DW MRI was performed using the commercially available nICE software package (Nordic NeuroLab, Bergen, Norway). ADC maps were calculated using a mono-exponential approach and all three b-values. Mean ADC for each breast tumour was obtained by manually drawing a region of interest (ROI) on the ADC map best depicting the tumour. DCE MRI and native 800 s/mm2 DW MR images were used to guide positioning of the ROI within the solid part of the tumour, avoiding tumour borders and areas of necrosis.

Pathological response evaluation

Treatment response was determined by histopathological examination of the surgical specimen. Pathological complete response was defined as absence of invasive cancer and non-pCR defined as residual invasive tumour of any size, regardless of the presence of ductal carcinoma in situ (DCIS) [27].

Statistical analysis

All statistical analysis was performed using SPSS version 16.0 (SPSS Inc., Chicago, IL, USA). Groups were compared using the non-parametric two-sided Mann-Whitney U test and Spearman’s rank correlation coefficient. Categorical variables were compared using the two-sided Fisher’s exact test. Performance for the different parameters in differentiating between pCR and non-pCR patients was assessed by creating receiver operating characteristic (ROC) curves and comparing the area under the curves (AUCs). Significance level was set at 5%.

Results

MRI assessed treatment response

Figure 2 shows MR images from one patient who obtained pCR and one patient who did not. Changes in tumour size during treatment are shown in Fig. 3. Before treatment, mean longest tumour diameter, Vcalc and Vseg, were 5.4 cm, 41.2 cm3 and 27.8 cm3, respectively. At Tp1 mean tumour size reductions were 59%, 76% and 78% for longest diameter, Vcalc and Vseg, respectively. Vcalc was significantly larger than Vseg (p = 0.004) at Tp0, but there was a strong correlation between the two tumour volumes both at Tp0 (r = 0.900, p < 0.001) and at Tp1 (r = 0.827, p < 0.001). At Tp2 mean tumour size reduction from baseline was 68%, 89% and 93% measured as longest diameter, Vcalc and Vseg, respectively.

Fig. 2
figure 2

Axial greyscale subtraction MR images and colour-coded ADC maps from a patient obtaining pCR (panel a) and a patient not responding to neoadjuvant chemotherapy (panel b) before treatment (left column), after 4 cycles of neoadjuvant therapy (middle column) and before surgery (right column). Both patients had locally advanced invasive ductal carcinoma grade 3. The responding patient showed a 55% reduction in longest tumour diameter at MRI after 4 cycles of NAC. In the non-responding patient tumour diameter increased by 5%. Tumour ADC increased by 155% between Tp0 (0.80 × 10-3 mm2/s) and Tp1 (2.01 × 10−3 mm2/s) in the responding patient, but remained unchanged in the non-responder (1.21 × 10−3 mm2/s and 1.35 × 10−3 mm2/s at Tp0 and Tp1 respectively)

Fig. 3
figure 3

Tumour size (longest diameter (a), calculated tumour volume according to the ellipsoid formula, Vcalc, (b) and semi-automatic segmented tumour volume Vseg (c)) before treatment (Tp0), after 4 cycles of neoadjuvant treatment (Tp1) and 1–2 days before final surgery (Tp2) for the 29 patients receiving neoadjuvant chemotherapy. Patients who did obtain a pathological complete response (●, n = 10 (11 lesions)), those who did not (∆, n = 17) and those who did not have surgical treatment (×, n = 2). Vseg includes previously published data [21]

Before treatment mean tumour ADC was 1.1 × 10−3 mm2/s, and there was no difference between the pCR and the non-pCR groups. Tumour ADC at Tp0 and Tp1 are shown in Fig. 4. It is noted that the non-pCR patient with a marked ADC increase (44%) was a near-pCR, pathological assessment showed residual tumour of 1.5 mm. Compared with baseline values, mean tumour ADC for the entire study population was significantly increased at Tp1 (1.4 × 10−3 mm2/s, p < 0.001). Furthermore, mean tumour ADC in the pCR group (1.7 × 10−3 mm2/s; range, 1.0–2.1 × 10−3 mm2/s) was significantly higher than the mean value for the non-pCR group (1.2 × 10−3 mm2/s; range, 0.9–1.7 × 10−3 mm2/s) (p = 0.022).

Fig. 4
figure 4

Breast tumour ADC before treatment (Tp0) and after 4 cycles of neoadjuvant chemotherapy (Tp1) for patients obtaining pCR (●, n = 11), those who did not (∆, n = 17) and those who did not have surgical treatment (×, n = 1). At Tp0, ADC values were obtained from 27 lesions in 27 patients. At Tp1, the number of lesions from which ADC could be obtained was reduced to 22 due to image artefacts, scattered tumour growth or minimal disease/complete response. Tumour ADC includes previously published data [21]

Pathological examination

Eleven specimens that revealed no invasive tumour were classified as pCR; 6 of these showed DCIS. Seventeen specimens were classified as non-pCR with pathological residual tumour ranging from 1.5 to 80 mm; 6 patients also showed DCIS. Owing to scattered tumour growth, diameter measurements were not obtained from 4 of the specimens.

Correlation between imaging techniques and pathological examination

Correlation between maximum tumour diameter measured at pathology and obtained from imaging at Tp2 was not significant for mammography (r = 0.46, p = 0.21, n = 9), but was significant for ultrasound (r = 0.59, p = 0.01, n = 18) and MRI (r = 0.87, p < 0.001, n = 15).

MRI accuracy

The diagnostic accuracy of MRI for pathological response at Tp2 are summarised in Table 3. The overall MRI accuracy was 77%. 73% (8/11) of HER2-positive lesions obtained pCR. The response rate for HER2-negative lesions was 18% (3/17). Prediction of pCR (negative predictive value) in the HER2-positive group was 100% versus 50% in the HER2-negative group.

Table 3 Overall performance of MRI for prediction of pathological response

The discordance can in the cases of false positives be explained by the presence of DCIS. Patient 12 showed residual tumour of 9 mm at MRI before surgery, at pathological examination 9 mm DCIS. Patients 8 and 10 were categorised as ‘residual tumour present, not measurable’, at MRI, pathological examination showed DCIS. All three obtained pCR but with DCIS present. One false negative case had pathologically minimal residual disease of 1.5 mm. The other false negative had not measurable tumour at MRI, whereas pathological evaluation showed scattered tumour growth.

Prediction of pCR-pretreatment tumour characteristics

Analysing baseline data, we found that HER2 overexpression was the only significant predictor of pCR; odds ratio of 12.4 (95% CI, 2.1–72.5; p = 0.006).

Prediction of pCR-early changes in tumor size

Including all 26 patients, AUC for pCR prediction at Tp1 was 0.78, 0.82 and 0.76 for reduction of tumour diameter, Vcalc and Vseg, respectively. For the patients who were switched to taxane-based regimens after Tp1 (n = 17) the corresponding AUCs were 0.86, 0.89 and 0.86, respectively. For the patients who received 6 cycles of FEC, AUC was not significantly different from the reference line, probably due to the limited number of patients (n = 9). Despite the slight differences in AUC between the taxane group and the study population as a whole, we chose to include all patients in the further analyses.

At Tp1 MRI assessed longest tumour diameter and Vcalc were significantly different between patients achieving pCR and those who did not (Table 4). Percentage changes in all MRI assessed tumour sizes also significantly correlated with pathological outcome, change in Vcalc showing the strongest predictive value (p < 0.001). Sensitivity and specificity for positive prediction of pCR, as a function of varying cut-off values for MRI assessed tumour size reductions at Tp1 are illustrated in Fig. 5. For correct prediction of pCR based on tumour size reduction, the highest combined sensitivity (91%) and specificity (80%) were obtained using a cut-off of 83% reduction in Vcalc

Table 4 Individual magnetic resonance imaging (MRI) parameters versus pathological response
Fig. 5
figure 5

Specificity (─) and sensitivity (∙∙∙∙) for prediction of pCR at different cut-off values of MRI tumour size reduction at Tp1

Prediction of pCR-early changes in ADC

The AUC for pCR prediction at Tp1 based on tumour ADC was larger for the patients who were switched to taxane-based regimens (AUC = 0.95, n = 15) than for the entire study population (AUC = 0.80, n = 22). We chose to analyse the predictive value of ADC data for the study population as a whole, similar to the approach chosen for tumour measurements. At Tp1 ADC values were significantly different in the pCR and non-pCR groups (p = 0.02), whereas ADC increase was not (Table 4). Figure 6 shows the sensitivity and specificity of pCR prediction using different cut-off values for ADC (a) and percentage ADC increase (b) at Tp1. For tumour ADC, high combined sensitivity (88%) and specificity (80%) were obtained using a cut-off of 1.42 × 10−3 mm2/s.

Fig. 6
figure 6

Specificity (─) and sensitivity (∙∙∙∙) for prediction of pCR for different cut-off values of ADC (a) and ADC increase (b) at Tp1

Discussion

Pathological complete response is the ultimate goal for NAC as it strongly correlates with a favourable prognosis [47]. In this study HER2 overexpression was the only baseline parameter showing significant correlation with pCR. After 4 cycles of NAC changes in tumour size and tumour ADC were independent and significant strong predictors of pathological outcome.

The accuracy of MRI for detecting residual disease was 80%. This is in accordance with previously reported studies [28, 29]. The discordance between MRI and the pathological result can partly be explained by our choice of pathological categories, i.e. pCR defined as absence of invasive tumour of any size, regardless of DCIS. In some studies DCIS is defined as residual tumour [30], in other studies pCR and near-pCR are the same category [16]. All three false positives in our study were cases in which DCIS was present. This is comparable to previous results, and it is stated that these discordances most likely will not bear negative clinical consequences [11]. Although the presence of DCIS does not affect long-term outcome [27], DCIS has been shown to increase the risk of ipsilateral breast tumour recurrence and should therefore be diagnosed and resected [11]. Regarding our false negative case with minimal residual disease: this case is near-pCR and will have a similar prognosis to that of a pCR case as it has been shown that prognosis is the same for pCR and ‘near-pCR’ patients, the prognostic factor being total tumour burden [28, 29]. Despite superior accuracy compared with other imaging techniques, MRI still has limitations, mainly by not revealing residual tumour in a scattered pattern as seen in lobular carcinomas and after shrinkage of multifocal tumours [29]. In our study, only one of these cases was a false negative, the other cases were correctly reported as non-CR. The exclusion of such cases, as seen in some studies, would probably increase MRI accuracy.

In our study population there was a significantly higher pCR rate in the HER2-positive patients compared with the HER2-negative patients; prediction of pCR in the HER2-positive group was 100%, versus 50% in the HER2-negative group. A previous study has showed increased MRI accuracy in the HER2-positive subgroup [28], whereas another recent study showed lower MRI accuracy in HER2-positive patients [30]. It has been suggested that the conflicting reports on HER2 positivity affecting MRI accuracy may be attributed to the use of trastuzumab; in the study showing increased accuracy trastuzumab was used. In our study, 8 HER2-positive patients were switched to taxanes and trastuzumab, and 5 of these lesions obtained pCR. The 3 HER2-positive patients who received 6 cycles of FEC also obtained pCR. Five of 6 patients receiving trastuzumab were correctly classified at preoperative MRI, whereas two HER2-positive patients receiving 6 FEC cycles were false positives.

This study showed no significant correlation between measurements at mammography and measurements at pathological examination. Correlation between ultrasound and pathological examination was significant, albeit much lower than for MRI. Moreover, difficulties in obtaining measurements at both mammography and ultrasound severely diminish the clinical value of these examinations in the neoadjuvant setting. Large and irregular tumours make ultrasound measurements difficult before treatment; furthermore ultrasound is in general not recommended for response evaluation of tumours according to RECIST, owing to problems like operator dependency and lack of reproducibility. Fibrosis replacing vital tumour tissue makes response evaluation difficult at both mammography and ultrasound. As stated in the European Guidelines, MRI should be the method of choice in response evaluation.

Radiological assessment should assist the oncologist in differentiating the patients who are likely to achieve pCR with the ongoing treatment from those who will benefit from changing treatment regimen. Known predictors of pCR to NAC include tumour size, histological grade, hormone-receptor status and HER2 status, among others. Univariate analyses showed that HER2 positivity was a strong predictive factor of pCR at baseline. Different treatment regimens and the limited number of patients represent limitations in our study. However, our data indicate that HER2 overexpression may be a positive predictive factor regardless of trastuzumab treatment. HER2-positive patients also had a stronger increase in ADC at Tp1, compared with the HER2-negative patients. ADC increase emerges in our study as an independent predictive factor at Tp1, i.e. before any patients received trastuzumab.

Tumour ADC at Tp1 was significantly higher in the pCR group compared with the non-pCR group. Varying cut-off ADC values at Tp1 were used to investigate pCR prediction. A study has reported sensitivity and specificity of 79% and 80% using a cut-off value of 23.8% in percentage ADC change following 3 cycles of NAC [26]. We found high sensitivity (88%) and specificity (80%) using a cut-off ADC of 1.42 × 10−3 mm2/s. Whereas ADC values at Tp1 were significantly different in the pCR/non-pCR groups, percentage ADC increase was not. Our data indicate that there may be a threshold value for ADC, i.e. that ADC above this threshold is required to obtain pCR prediction. It should be noted that the cut-off ADC will depend on the choice of imaging protocol and b-values.

In our study a single ROI was used for assessment of tumour ADC. Using this approach the sampled region may not be representative for the whole tumour. However, sampling of the whole tumour volume would inevitably include necrosis and neighbouring non-cancerous tissue. We chose to place the ROI within the most viable and solid part of the tumour in order to exclude tumour necrosis and minimize partial volume effects from tumour borders.

At Tp1 MRI assessed tumour size and percentage change in longest diameter and Vcalc significantly correlated with pCR, Vcalc reduction being the strongest predictor. With a cut-off of 83% reduction in Vcalc, sensitivity and specificity for pCR were 91% and 80%, respectively. It should be noted that lower cut-off values may be optimal if evaluation takes place earlier in the course of treatment. Vseg was a less sensitive pCR predictor than Vcalc. After the onset of therapy, contrast enhancement in tumour tissue is altered and the enhancement may resemble that of benign breast tissue. Thus, threshold-based segmentation procedures run the inherent risk of underestimating tumour burden during and after treatment. However, reduction in Vcalc and Vseg showed a significant correlation at Tp1, indicating the potential of time-saving automatic procedures.

The goal of monitoring the effect of NAC should be to discriminate between responders and non-responders early in the treatment and in a reliable way. It may be argued that MRI after 4 cycles is not an optimal time point for response evaluation during treatment. There are numerous studies reporting ADC increases early in the treatment, whereas depiction of significant size reduction has been reported after 2, 3 and 4 cycles [17, 2426]. Our study protocol was made in collaboration with oncologists in our institution, and MRI after 4 cycles was chosen because of the emphasis on the need for reliable response evaluation. After 4 cycles the treatment regimen may still be changed to optimise the response. Optimal timing for response monitoring will depend on the anticipated treatment response. Larger studies are required to establish the optimal time point for response evaluation.

Conclusion

Our study confirms that MRI is superior to the other techniques and should be the chosen method of response evaluation and prediction of pCR before surgery. After four cycles of NAC, ADC, tumour size and tumour size reduction showed predictive value for pCR, ADC and calculated tumour volume being the strongest predictors.