Introduction

The apparent diffusion coefficient (ADC) commonly used in quantitative magnetic resonance (MR) diffusion-weighted imaging (DWI) data is known to depend on many factors, including the diffusion coefficients in the various tissue compartments, the geometry of the diffusion environment, membrane integrity, the presence of macromolecules leading to water binding, the cellular volume fraction and the image acquisition parameters. Its link with tumour cell density makes the ADC ideal for monitoring cytotoxic response, where increased ADC is associated with cell death [1]. However, changes in tissue physiology after treatment with cytostatic agents are still being elucidated, and it is possible that such treatments may cause changes to any of the above factors linked to the ADC value.

Although simple and robust, the ADC gives an incomplete picture of the diffusion process in both normal tissues and tumours [2, 3]. The idea behind applying more complex models is that the additional parameters may be sensitive to other treatment effects, and by modelling the diffusion signal more completely, the accuracy of the parameters may be improved and their link with tissue changes may be made more specific.

The contribution of blood flow and perfusion to ADC has been documented in several preclinical studies where changes in perfusion were shown to be correlated with changes in ADC [4, 5]. Since the perfusion effect dominates at low b-values, a perfusion-insensitive ADC can be obtained by fitting the higher b-values only [6], or a bi-exponential model can be fitted to all data, and the signal at low b-values is associated with an intravoxel incoherent motion (IVIM) effect [7, 8] and the signal at higher b-values is associated with a diffusion effect. There is preliminary evidence from clinical studies that treatment responses in the IVIM parameters, as well as the diffusion parameter, can be detected [9].

Whilst the IVIM model has the advantage that its components can be associated with distinct physical phenomena, in practice there may be insufficient data support for the additional parameters, which leads to poor estimation accuracy. The stretched-exponential model has been applied in a number of tissues [1012], but has received little attention as an early clinical measure of response. The parameter α in this model is less straightforward to interpret than the IVIM parameters, but may be associated with the heterogeneity of diffusion coefficients [10], or with anomalous diffusion [13] where α < 1 implies the mean-square displacement increases sub-linearly with time.

The aim of this work was to apply three attenuation models to diffusion-weighted MRI data from a phase I population treated with a potent VEGF inhibitor (cediranib) to determine which parameters are sensitive to tumour response at which time-points. This agent is known to cause rapid and sustained changes in vascular physiology, both pre-clinically [1416] and clinically [17], and these changes have been observed in the patients in this study [18].

Materials and methods

This exploratory, single-center, open-label, nonrandomized study was approved by the local research and ethics committee and informed consent was obtained from all patients before enrolment in the study.

Patient population and treatment schedule

Due to the primary study objectives (18), patients were selected to be representative of a phase I population—all had multiple metastatic disease sites, previously treated using a range of therapies, including chemotherapy and radiotherapy. Patients were deemed eligible if: all conventional lines of treatment had been exhausted; they had a metastatic soft-tissue tumour at least 3 cm in longest diameter; the tumour location was amenable to repeat measurement with DCE-MRI; patients were at least 18 years of age; performance status 0-1. Patients were ineligible due to: brain or meningeal metastases; poor renal or liver function; poorly controlled hypertension; significant hemoptysis or haemorrhage after recent surgery; allergy to contrast agents containing iodine or gadolinium; metal implants or a pacemaker. The therapy consisted of a daily dose of cediranib taken orally—13/27 patients started on a 45 mg dose, reduced to 30 mg in the remaining 14/27 patients (the typical dose used for monotherapy studies). A single dose reduction was allowed during the trial, which led to seven patients reducing from 45 mg to 30 mg, and one patient from 30 mg to 20 mg.

Imaging data were acquired from 29 patients at up to four time points, of which 27 had diffusion-weighted images available for analysis in this study. In one instance, the diffusion acquisition failed due to a technical fault; in the other, the patient had a stent precluding them from receiving the diffusion examination due to the risk associated with the large diffusion probing gradients.

Response assessment

Antitumour activity was assessed using RECIST (version 1.0), and results are summarized in Table 1.

Table 1 Summary of the best objective response (RECIST) for the 27 patients in this study

Image acquisition

Diffusion-weighted images were acquired without breath-holding, as this is better tolerated and gives predictable scan durations [19]. Abdominal regions are affected by motion, so the impact of the free-breathing acquisition was assessed by grouping patients with abdominal (17/27) and pelvic disease sites (9/27) (one head and neck tumour excluded from both groups).

Baseline imaging was performed before the first dose at two time points, 7 days apart and within 14 days of starting treatment. At treatment, imaging was performed ±1 day of day 7 and day 28.

Axial DWI data were acquired at 1.5 T (Avanto; Siemens, Erlangen, Germany) using the same multi-slice EPI sequence for all disease sites: 20 × 5 mm slices; 380 × 380 mm FoV; 128 × 128 matrix; 6/8 partial acquisition; TR = 3500 ms; TE = 69 ms; six repeat acquisitions; GRAPPA = 2; SPAIR fat suppression. Six b-values were acquired as three-scan trace images with diffusion weightings of 0, 50, 100, 250, 500, 750 s/mm2 and with δ = 23.4 ms and Δ = 31.7 ms. The total acquisition time was just under 7 minutes.

Image analysis

Regions of interest (ROIs) were drawn on the b = 500 s/mm2 images to include tissues with high lesion/tissue contrast and an acceptable signal-to-noise ratio (see Fig. 1), which favours selection of “viable tissue,” while avoiding necrotic/cystic regions [20]. ROIs were drawn on all tumour slices, giving full coverage in 19/27 patients—the remaining patients had large tumours that extended outside the 100 mm imaging volume, so only partial coverage was achieved from the 20 slices. Voxel-wise estimates were obtained by non-linear least-squares fitting using three models.

Fig. 1
figure 1

Diffusion-weighted parameter maps for an example patient with stable disease at RECIST assessment. The left-hand panel shows the b = 500 s/mm2 image with the tumour region of interest highlighted. The images in the right-hand panel show parameter maps for the eight measures at three time-points, the corresponding histograms for the whole tumour volume are shown below. Key features of interest are: similar values for ADC, ADC#, DDC and D at baseline and day 7, followed by a decrease at day 28; an increase in α at days 7 and 28; decrease in D* at day 28 and f.D* at day 7 and 28, but no changes in f

Estimates of the apparent diffusion coefficient (ADC) were obtained by fitting a mono-exponential model to all b-values and separately to b ≥ 100 s/mm2 (ADC#).

Estimates of D, D* and f were obtained by fitting the IVIM model [7, 8]

$$ S(b)={S}_0\left(f \exp \left(-b{D}^{*}\right)+\left(1-f\right) \exp \left(-bD\right)\right) $$

with 0 < f < 1, and f.D* was also recorded. This composite parameter has been linked to relative perfusion [31], so it is possible that f.D* (rather than f or D* alone) may show a significant correlation with DCE-MRI-based perfusion measures. In this model, D is the diffusion coefficient, and f and D* are the volume fraction and pseudo-diffusion coefficient thought to be related to a flow effect.

Estimates of DDC and α were obtained by fitting the stretched-exponential model [10],

$$ S(b)={S}_0 \exp \left(-{(bDDC)}^{\upalpha}\right) $$

with 0 < α < 1, where DDC is the distributed diffusion coefficient and α is the stretching parameter [10].

Median values were reported for each volume, as the tumour parameters are not normally distributed (see Fig. 1), resulting in eight measures for each time point. Image and statistical analyses were performed using in-house software developed with MATLAB (Natick, MA).

Statistical analysis

All statistics are calculated from the logarithm of the parameter and back-transformed where appropriate [21], and the mean of the logarithm of the two baseline measurements was used as the baseline when treatment differences are calculated. Measurement repeatability was assessed with the method of Bland and Altman [22] for all parameters using the coefficient of variation (CoV), defined as \( \mathrm{C}\mathrm{o}\mathrm{V}=100\%\times \sqrt{ \exp \left({\upsigma}^2/2\right)-1} \), where σ2 is the variance of the difference between the baseline parameters. Separate measures were calculated for abdominal and pelvic disease sites and differences between these statistics were assessed using F-tests. Non-treatment changes between the baseline measurements were assessed using two-sided paired t-tests.

Cohort baseline values for all parameters were calculated and separate baseline values were calculated for abdominal and pelvic disease sites, and the two sub-groups compared using unpaired two-tailed t-tests. Inter-patient variability was assessed by calculating the variance of the mean of the logarithm of the two baseline values for each patient. These statistics were converted to percentages with \( 100\%\times \sqrt{ \exp \left({\upsigma}^2\right)-1} \). Treatment effects at both time points were calculated as percentages by back-transforming the difference of the logarithm of the parameters. The significance of these changes was assessed with two-sided paired t-tests. Differences in the treatment effects observed for the two disease sites were assessed using unpaired two-tailed t-tests.

Relationships between treatment effects and RECIST classification were explored, although a full statistical evaluation was not possible due to the small sample size of some groups (see Table 1). In addition to the whole cohort treatment effects, treatment effects were also calculated excluding two patients with a partial response in order to indicate whether significant treatment effects are dominated by the partially responding patients (see Table 4). Unpaired two-tailed t-tests were also used to compare treatment effects of patients with progressive disease (8/27) and stable disease (14/27).

At each time-point, the number of voxels inside the previously drawn ROIs was used to measure the viable tumour volume, from which volume changes were calculated at both treatment time-points. Paired t-tests were used to assess the significance of cohort volume changes. Pearson’s correlation coefficient was used to correlate volume changes with changes in the eight DWI measures at both treatment time-points.

Pearson’s correlation coefficient was also used to assess correlations between treatment-induced changes in the eight DWI measures reported here and changes in four of the previously reported DCE-MRI measures (iAUC60, K trans, ve and enhancing fraction, EF), giving 32 comparisons at each time-point.

Model comparisons between the various models were made using the Akaike information criterion (AIC) [32]. Firstly, a direct comparison between the two more complex models (stretched-exponential and IVIM) was performed by computing the AIC for both models in every voxel, and the majority vote within each tumour was used as the model preference for that tumour. The proportion of tumours with a preference for each model was then recorded for each time point, and over the trial as a whole. Since the mono-exponential model is a special case of the other two (α = 1 for the stretched exponential model, f = 0 for IVIM), model comparisons with this model were designed to take this into account. Using the AIC, voxels with an overall preference for the mono-exponential model were selected, and Pearson’s correlation coefficient was used to compare ADC with DDC and D in these voxels. In addition, the mean values of α and f were calculated for these voxels.

As this study is primarily descriptive, all p values are presented as raw values, and are not corrected for multiple comparisons at multiple time-points. For the purposes of discussion, multiple comparisons are accounted for by adjusting the p value significance thresholds as follows. For pre-treatment comparisons, a correction factor of 7 is used (eight DWI parameters, excluding the derived parameter f.D*), and for post-treatment comparisons, a correction factor of 14 is used to account for the two time-points. Thus statistically significant comparisons have p values less than 0.05 (no correction), and highly significant comparisons have p values less than 0.007 (Tables 2 and 3) or 0.0036 (Table 4).

Table 2 Repeat measures coefficients of variation (%) calculated from the two baseline measurements for the various parameters for all patients and for patients with abdominal and pelvic disease sites. One patient had a head and neck tumour that is included in the “All” column only. p values relate to F-tests comparing the repeat measures variances of the abdominal and pelvic sub-groups. Significant statistics have p < 0.05, while highly significant statistics have p < 0.007 (which includes a correction for multiple comparisons, see text)
Table 3 Average baseline values of the various parameters (all units mm2/s except where shown). Values in parentheses are the inter-patient coefficient of variation reported as a percentage, and the p values are for a two-tailed unpaired t-test comparing the abdominal and pelvic sub-groups. Significant statistics have p < 0.05, while highly significant statistics have p < 0.007 (which includes a correction for multiple comparisons, see text)
Table 4 Percentage changes observed after 7 and 28 days of treatment. Values in parentheses give the p value of a paired t-test between the baseline and post-treatment data. Significant statistics have p < 0.05, while highly significant statistics have p < 0.0036 (which incorporates a correction for multiple comparisons, see text). Data are shown for all patients and for non-responding patients, which excludes the two best responding patients (partial response at RECIST) to assess whether the measured changes are biased by these patients

Results

Repeat measures analysis

As shown in Table 2, the repeat measures CoVs evaluated from the two baseline measurements were less than 5 % for ADC, ADC# and DDC, while for D, it was 6 %. The CoV for f.D* was 6.3 %, while the pseudo-diffusion parameters f and D* were significantly worse at 22 % and 44 %, respectively. The CoV for α was 4.2 %. For all parameters, the CoVs for the pelvic sub-group were lower than the abdominal sub-group, and these differences were significant for ADC# and f (p < 0.05), and highly significant for α (p = 0.0004). p values for differences between the all baseline measures were > 0.3, indicating no systematic differences between the baseline measures.

Baseline measures

Average baseline values shown in Table 3 are consistent with values appearing in the literature, including those for ADC and ADC# [2, 23], IVIM measures [2426] and stretched exponential measures [3, 11, 12]. Comparing the pelvic and abdominal sub-groups, there were significant (ADC, DDC) or highly significant (ADC#, D) differences between the baseline diffusion-related parameters.

Treatment related changes

Table 4 shows the average treatment changes at both time-points for all patients and for non-responding patients, and Fig. 1 shows parameter maps for an example patient with stable disease (RECIST). For the whole cohort, treatment changes in ADC, ADC#, D, and DDC were not significant at day 7; changes at day 28 were significant (p < 0.05) for ADC, D and DDC, and highly significant for ADC# (p = 0.0015). Changes in D* and f.D* were highly significant at day 7, but not significant at day 28; changes in f were significant at day 28 only. Changes in α were highly significant and of a similar magnitude at both time points. Treatment changes and significance values were very similar for the non-responding sub-cohort of patients for all parameters, and there were no significant differences between patients with stable and progressive disease. p values comparing treatment effects between the two disease sites were > 0.1 for all parameters at both time-points, indicating that although there were differences in the baseline values of some parameters, treatment changes are not linked with the disease site. For the “viable” tumour volume changes, the mean volume change at day 7 was -5.6 % (p > 0.05), and at day 28, it was -22.5 % (p = 0.01).

Correlation with volume change and DCE-MRI measures

Correlations between changes in the DWI parameters and change in volume were weak. At day 7, three comparisons (ADC, ADC# and DDC) had p < 0.05 for a significant r value, and the most extreme r value was 0.55 (DDC), while at day 28, three had p < 0.05 (α, D and f) and the most extreme r value was -0.51 (α).

As reported in reference [18], the repeatability of the four DCE-MRI measures in these patients was 13.9 % (Ktrans), 15.5 % (iAUC60), 8.6 % (enhancing fraction) and 23.1 % (ve), respectively, with significant reductions in Ktrans and iAUC60 for both dose groups at days 7 (-45 % to -66 %, p < 0.05) and 28 (-45 % to -57 %, p < 0.05) and significant reductions in enhancing fraction for the 45 mg dose group at day 7 (-44 %, p < 0.05) and both dose groups at day 28 (-41 % and -27 %, p < 0.05). No significant changes in ve were observed. Correlations between these DCE-MRI measures and the DWI parameters reported here are weak—of the 32 possible comparisons, eight had p < 0.05 for a significant r value, and of these, only the K trans vs. D* correlation at day 7 had r > 0.5 (r = 0.54, p = 0.005).

Model comparisons

The AIC-based model comparisons show that the proportions of tumours preferring the stretched-exponential model at the four time-points were 59 % and 56 % (baseline), 65 % (day 7) and 88 % (day 28) and 66 % overall. The proportion of voxels preferring the mono-exponential model was 39 % and the correlation coefficient between ADC and DDC in these voxels was 0.999, and between ADC and D was 0.41. The average values of α and f in the same voxels were 0.97 and 0.28, respectively. These results suggest that from a model selection perspective, the stretched-exponential model supersedes the mono-exponential model in that it is either equivalent or statistically preferred. Despite a similar theoretical connection between the IVIM and mono-exponential models, these observations do not indicate a similar relationship in practice.

Discussion

The key result of this work is that DW-MRI shows a statistically significant response to treatment effects at both 7 and 28 days of treatment using a VEGF inhibitor, and these changes can be robustly measured using the stretched-exponential model. Effects are observed in ADC, ADC#, DDC and D after 28 days; earlier effects are observed in α, f and D*. The significance of these results is strengthened by the repeatability data from the same patient cohort. The poor repeatability of the IVIM parameters combined with the lack of correlation with the changes in the DCE-MRI measures in the same patients indicates that these data do not support a link between the early diffusion changes and a flow effect. The stretched-exponential parameters α and DDC are highly repeatable and α is sensitive to both early and late effects, and so these data provide support the wider use of this model in assessing treatment effects beyond those routinely measured with ADC.

At around 4 %, the repeatability of ADC and ADC# are consistent with previously reported studies [2, 27], while the repeatability of D and DDC are slightly weaker (6.0 and 4.8 % respectively), but consistent with previous reports [12, 25, 26]. Repeatability of α is 4.1 %, suggesting that the data support the additional parameter in this model. Repeatability of f and D* are considerably weaker, and have values consistent with other studies [25, 26], and in particular with [28]. Although this study was in cirrhosis patients, the lowest three b-values were the same, which may explain the similarity with the repeatability statistics here since f and D* are primarily (but not exclusively) influenced by the lower b-values. In this study, the repeatability of f.D* is more consistent with the non-IVIM parameters at 6.3 %, which is suggestive of inversely correlated errors between f and D*. Combined with the weak repeatability of f and D* and the AIC-based preference for the stretched-exponential model over IVIM, this is indicative of an over-parameterized model, which may be because the tumours in this study did not have a measurable IVIM effect, or because the size and number of low b-values used in this study were inadequate to detect an IVIM effect.

The images in this study were acquired in free-breathing and without the use of navigator control, as this has been shown to be of limited benefit to the estimation of diffusion model parameters when volume averaged values are of interest [19]. The location-specific CoVs (Table 2) are consistent with the expectation that abdominal sites should be more affected by breathing motion than pelvic sites. Repeatability of all parameters is better for pelvic sites compared with abdominal sites, and this reaches significance for ADC# and f, and is highly significant for α.

Since ROIs were drawn on the b = 500 s/mm2 images, changes in the estimated parameters reflect treatment effects to “viable tissues” (20), and avoid fluid areas such as cystic or necrotic regions that may adversely bias the results. Changes in ADC, ADC#, D and DDC were consistent with the mechanism of action of an anti-angiogenic agent: reduced perfusion was observed after 7 days of treatment, indicated by a 40 % decrease in K trans and iAUC60 (18), but more than 7 days with reduced perfusion are required to produce an effect on tumour cellularity; hence, no changes were observed in the diffusion measures at day 7. The DCE-MRI perfusion changes are sustained until day 28 and the diffusion parameters show a significant increase at this time-point. These data suggest that the DCE-MRI and diffusion-weighted imaging are sensitive to different, complimentary aspects of the treatment response that occur at different times.

Changes to ADC# and D are well correlated across patients (0.802, p < 10-6), indicating that they are reporting on the same tissue characteristics—both are used as perfusion insensitive diffusion estimates [6, 29]. Changes in DDC at day 28 are highly correlated with changes in ADC and ADC# [0.99, p < 10-6], indicating that changes in DDC can be interpreted in the same way as ADC and ADC# in these patients. This is also implied by the observation that the stretched-exponential model supersedes the mono-exponential model in these data.

To our knowledge, this is the first report of the detection of early changes by diffusion-weighted imaging, as measured by α, with a VEGF inhibitor. Changes in α are highly significant and of a similar magnitude at day 7 and day 28. Patient-wise correlations between changes in α and the DCE measures are weak—there were no significant correlations at day 7, and the only significant (but weak) correlation at day 28 was with K trans where r = -0.5 (p < 0.05). This lack of correlation means it is not possible to draw a link between the established interpretation of K trans as being associated with perfusion and vascular permeability phenomena in order to infer the tissue changes detected by α in these data. A more direct link between perfusion and the IVIM parameters f and D* has been suggested, but the poor correlations reported here and similarly in [28, 30] do not support this.

In the first paper applying the stretched exponential model to MR diffusion modelling [10], two interpretations were suggested: either as an anomalous diffusion process, or in terms of a distribution of diffusion coefficients over each voxel. In the absence of direct evidence, the DWI data does not provide evidence to demonstrate which interpretation is appropriate in these tumours. However, both interpretations are clinically plausible and relevant, and provide complimentary information to existing imaging measures that may be of use in response assessment and monitoring with novel therapeutics.

This study was in a cohort of patients typical of a phase I clinical trial, and whilst the response assessment results (Table 1) concur with previous results using the same therapy [17], a limitation of this study is that the tumour locations, types and treatment histories were very varied, which makes it difficult to draw specific conclusions on the biological causes of the detected changes. Further work is needed to elucidate the tissue effects leading to the observed early changes at DWI, and a more tightly controlled patient cohort would be appropriate for such a study. Although the b-values used in this study are not atypical of those used to measure IVIM effects, more b-values < 200 s/mm2 are needed to improve measures to a level where clear conclusions can be drawn.

The statistical results presented here demonstrate that α is a highly repeatable measure that can be used to detect treatment effects in addition to those measured with the diffusion coefficient. However, although the diffusion signal is sensitive to early and late treatment-induced changes, it was not possible to link changes in α or the IVIM parameters with any particular tissue properties, in particular vascular changes. The particular model that is used to quantify diffusion-weighted changes is therefore a mathematical convenience to demonstrate this observation about the diffusion-weighted signal. In this study, the two parameters in the stretched-exponential model were sufficient to completely capture the treatment-induced variability measured with diffusion-weighted imaging using six b-values, and while DDC is linked to the diffusion process, α should be considered as a purely descriptive parameter.