Repeated diffusion MRI reveals earliest time point for stratification of radiotherapy response in brain metastases

An imaging biomarker for early prediction of treatment response potentially provides a non-invasive tool for better prognostics and individualized management of the disease. Radiotherapy (RT) response is generally related to changes in gross tumor volume manifesting months later. In this prospective study we investigated the apparent diffusion coefficient (ADC), perfusion fraction and pseudo diffusion coefficient derived from diffusion weighted MRI as potential early biomarkers for radiotherapy response of brain metastases. It was a particular aim to assess the optimal time point for acquiring the DW-MRI scan during the course of treatment, since to our knowledge this important question has not been addressed directly in previous studies. Twenty-nine metastases (N  =  29) from twenty-one patients, treated with whole-brain fractionated external beam RT were analyzed. Patients were scanned with a 1 T MRI system to acquire DW-, T2*W-, T2W- and T1W scans, before start of RT, at each fraction and at follow up two to three months after RT. The DW-MRI parameters were derived using regions of interest based on high b-value images (b  =  800 s mm−2). Both volumetric and RECIST criteria were applied for response evaluation. It was found that in non-responding metastases the mean ADC decreased and in responding metastases it increased. The volume based response proved to be far more consistently predictable by the ADC change found at fraction number 7 and later, compared to the linear response (RECIST). The perfusion fraction and pseudo diffusion coefficient did not show sufficient prognostic value with either response assessment criteria. In conclusion this study shows that the ADC derived using high b-values may be a reliable biomarker for early assessment of radiotherapy response for brain metastases patients. The earliest response stratification can be achieved using two DW-MRI scans, one pre-treatment and one at treatment day 7–9 (equivalent to 21 Gy).


Introduction
The promise of personalized medicine goes hand in hand with the development of biomarkers (a portmanteau of biological markers). Fortunately the list of potential biomarkers is growing rapidly, increasing hope for true individualized diagnostics and prognostics (ESR 2013). According to common definitions a biomarker has to be an objective and quantifiable characteristic of a biological process (Strimbu and Tavel 2010). Moreover, if a biomarker has to serve as a surrogate for a clinical endpoint (i.e. type 2 biomarker), for example survival or disease control, there must be solid scientific evidence that it consistently and accurately predicts that clinical outcome. This requires validation through independent comparable studies.
Imaging biomarkers represent a class of biomarkers that are gaining popularity both due to their non-invasiveness, and due to technological and biotechnological advancements (biological/physiological MRI, PET tracers, dual energy CT etc) allowing a glance into ever more subtle biological processes at higher resolution. Quantitative diffusion weighted magnetic resonance imaging (DW-MRI) represents a popular and strong candidate for imaging biomarker of treatment response. DW-MRI is however still not in standard clinical use for individualized prognostics, as MRI is an expensive and complex modality and because clinical studies are not entirely mutually consistent. One reason is that the large number of study designs and MRI settings do not allow a direct comparison. Increased diffusivity has been reported in responding tumors, e.g. Chenevert et al (1997), Mardor et al (2003) and Koh et al (2007aKoh et al ( , 2007b. Interestingly, variations in the degree of diffusivity increase are seen (Afaq et al 2010), and even decrease in diffusivity has been reported in responding tissues (Hein et al 2003). Differences in the timing of the DW-MRI scans may account for some of the observed variations seen; a quantifiable biological characteristic (biomarker) may perform poorly at one time point during treatment, and well using another set of time points.
In this prospective study we investigated the early biomarker potential of three DW-MRI metrics, apparent diffusion coefficient (ADC), perfusion fraction (p f ) and pseudo diffusion coefficient (D p ), for radiotherapy (RT) response in patients with brain metastases, using the intra-voxel incoherent motion (IVIM) framework (Le Bihan et al 1986). Specifically, the optimal time point for acquiring DW-MRI data during the course of treatment was investigated. To our knowledge this specific question has not been addressed in previous studies, despite the incontestable fact that cellular changes occur immediately after onset of RT (Azzam et al 2012, Barrera 2012, and evolves during the time course of the treatment (Moffat et al 2005) rendering a time and/or dose dependent cellular environment.

Patients
In this prospective study 30 patients were included. Nine patients were excluded from analyses due to early dropout and/or missing follow-up scan. Metastases suspicious of hemorrhage and melanin content, and metastases smaller than 0.8 cm 3 were excluded. In total twenty-nine (N = 29) brain metastases from 21 patients were analyzed. Included patients were scheduled for palliative intent RT with a total dose of 30 Gy in ten fractions (five fractions/week), delivered as whole brain irradiation with 6 or 15 MV photons. Concomitantly, 150 mg of antiinflammatory steroid (Prednisolone) was given daily during the course of RT. In all except one patient the Prednisolone course was started prior to the first scan; in one patient the day after the first scan. Exclusion criteria were contraindication to MRI in general (metal implants, claustrophobia etc) and to Gadolinium contrast in particular (renal insufficiency), and life expectancy less than 6 months. Additional patient characteristics are contained in table 1. The study was approved by the Danish Scientific Ethical Committee and the Danish data protection authorities (protocol no. H-4-2012-180).

MRI
The study design included twelve imaging sessions. Scan 1 (pre-RT scan) 0-3 d before start of the RT course, scan 2-11 on each day of the fractionated RT, and scan 12 (follow-up scan) two to three months after end of the RT course. The pre-RT and follow-up imaging proto col consisted of DW-, T2W/T2 * W-and T1W-MRI with gadolinium contrast (DOTAREM, 279.3 mg ml −1 , Guerbet, France). Scan 2-11 had a shorter imaging protocol, comprising T2W/T2 * W and DW sequences. For three patients the T2 * W-MRI scan was not acquired. In all patients a 1 T Philips Panorama MR system (Philips Healthcare, The Netherlands) was used with an eightchannel head coil. No RT fixation mask was used in order to improve patient comfort. Motion reduction was attained by post-processing using rigid co-registration of DW images acquired at different b-values using the scanner software (MR systems Panorama HFO, Release 3.5, Philips Medical Systems, The Netherlands). Total scan time was about 35 min for the full imaging protocol. DW-MRI scans were implemented using a single-shot EPI, fat-saturated (diffusionweighted whole-body imaging with background body signal suppression (DWIBS)) spinecho (SE) sequence with eight b-values (0, 50, 100, 150, 400, 500, 600, 800 s mm −2 ) in all three gradient directions subsequently modulus averaged, number of signal averages (NSA) = 3, with TR/TE 7411 ms/110 ms, flip angle (FA) 90, matrix 116 × 90, resolution 1.8 mm × 1.8 mm, slice thickness (sl) 4 mm, no gap. T2W sagittal images were acquired in a SE sequence, TR/TE 4387 ms/100 ms, FA 90, matrix 252 × 238, resolution 1 mm × 1 mm, sl 5 mm, slice gap 1 mm. Transversal T2 * W images were acquired with the T2 * gradient echo (GRE) parameters being TR/TE 897 ms/21 ms, FA 18, matrix 232 × 182, resolution 1 mm × 1 mm, sl 5 mm, no gap. In the end of the scan session patients were given a Gd-contrast dose of 0.2 mmol kg −1 , and transversal T2W/T1W sequences were obtained. T2W SE TR/TE 6898 ms/100 ms, FA 90, matrix 288 × 224, resolution 0.8 mm × 0.8 mm, sl 4 mm, no gap, T1W-3D GRE TR/TE 25 ms/6.9 ms, FA 30, matrix 336 × 267, resolution 0.7 mm × 0.7 mm, sl 1.6 mm.
DW-MRI reproducibility measurements were based on the variation in ADC of the cerebrospinal fluid. The coefficient of variance calculated individually yielded a grand mean of about 3%, which is to be considered the magnitude of change that can be confidently detected, without considering other uncertainties.

Region of interest (ROI)
Tumor delineations were performed by a radiologist (HHJ) with more than nine years of experience in DW imaging, using a freehand manual contouring tool (Eclipse v.10.0, Varian Medical Systems, Inc., USA). ROIs were drawn as high intensity regions of b = 800 s mm −2 DW images on all axial slices of each metastasis. Co-registered T2W image overlays assisted in avoiding areas with edema and necrosis. Metastases with visible blood and regions with melanin in patients with malignant melanoma were defined with T2 * W images and avoided in the ROIs.

Diffusion parameter estimation
The ADC was estimated by a mono-exponential model using all high b-values DW-MRI data (400, 500, 600, 800 s mm −2 ) (ADC (high) ), two high b-values (400, 800 s mm −2 ) (ADC (400,800) ) and one high b-value (0, 800 s mm −2 ) (ADC (0,800) ). Using high b-values only is previously reported to be robust and fairly accurate compared to using all b-values in a mono-or biexponential fit (Mahmood et al 2015). The perfusion fraction f p was estimated using the mono-exponential fit for ADC (high) with fixed values of ADC (high) and f p (attained in the mono-exponential estimation). In this model part of the irreversible signal loss is assumed to originate from the capillary perfusion. At voxel level the water flowing in randomly oriented capillaries mimics free diffusion, a concept known as intravoxel incoherent motion (IVIM) (Le Bihan et al 1986). Parameter estimation was based on the average value of the trace related to the three gradient directions, and using both mean and median pixel values of the entire ROI (all slices). Calculations were performed with in-house developed Matlab R2010a scripts (The Mathworks Inc., USA). A non-linear least-square fitting algorithm was used in all cases.

Response assessment
Response evaluation criteria in solid tumors (RECIST) were applied according to latest guidelines (Eisenhauer et al 2008) based on Gd-contrast enhanced T1W images. The volumetric response was based on the same images and with the same numerical thresholds for complete response (CR-all target lesions gone), partial response (PR-at least 30% decrease in sum from baseline), progressive disease (PD-at least 20% growth in sum compared to smallest sum post treatment) and stable disease (SD-all other), respectively. Linear measurement of longest transaxial axis and volume rendering was done with a measuring tool and freehand manual contouring tool (Eclipse v.10.0, Varian Medical Systems, Inc., USA), respectively.

Statistics
Receiver operating characteristic (ROC) curves were calculated with a built-in binomial regression function in Matlab, with the Statistics Toolbox. The area under the curve (AUC) was calculated using a built-in function (perfcurve) and by manual scripting for validation. Reported optimal cut off values of ADC were based on minimum distance from the point (1, 0) (the point at which both sensitivity and specificity is 1) to the fitted ROC curve. An alpha value of 0.05 was used for significance testing of H 1 : AUC > 0.5.

Results
Three metastases showed complete response (CR) with both RECIST and volumetric evaluation. Eleven and sixteen metastases showed partial response (PR), respectively, with RECIST and volumetric evaluation. Finally, using RECIST compared to volumetric evaluation thirteen versus seven metastases showed stable disease (SD), and two versus three metastases showed progressive disease (PD), respectively (figure 1). No correlation between primary disease and response category was observed. Tumor phenotypes did not show correlation to response either.
CR and PR were grouped as responding metastases and showed an average of about 6% increase in ADC at the final part of the treatment course when the volumetric response assessment method was applied. PD and SD were grouped as non-responding metastases showing an average decrease of about 7% in ADC at the final part of treatment (figure 2). Using the RECIST criteria the non-responding metastases showed less of a decrease in ADC and less monotonic. The responding metastases showed less increased ADC although with the same overall temporal features, i.e. the ADC based distinction between responding and nonresponding metastases was much harder when the RECIST criteria were applied, and unlike for the volumetric criteria, the ADC developments were non-monotonic after fraction 2. The average ADC showed an initial dip (at fraction 2) in both groups. An example of IVIM parameter fit is shown in figure 3.
The differentiation between the CR and PR group (responding metastases) and the PD and SD group (non-responding metastases) becomes more distinct and constant from fraction number seven, confirmed by the steep increase in AUC of the ROC curves at scan seven (figure 4). Cut-off values of the relative ADC vary between about 0.9 and 1.0 (corresponding to 0-10% reduction in ADC) (table 2). The common initial dip in ADC in both categories of metastases accordingly results in AUC values of about 0.5, corresponding to zero prognostic value.
The perfusion fraction showed very small increase with no distinct features in both groups of metastases (figure 5). The pseudo diffusion data was very noisy in general with no capacity to differentiate between responding and non-responding metastases either. (The obvious data outlier at fraction two was not seen if the median of ADC was used.) AUC analyses of ADC derived from a mono-exponential fit to b-values pairs (b = 0 and b = 800 s mm −2 ) and (b = 400 and b = 800 s mm −2 ) resulted in an AUC progression with similar trend as that based on ADC derived from the full range of high b-values (b = 400, b = 500, b = 600, b = 800 s mm −2 ). ADC derived from (b = 0 and b = 800 s mm −2 ), however, resulted in fewer time points at which the AUC was significantly larger than 0.5.

Discussion
One of the greatest challenges to widespread use of diffusion weighted MRI as a standard clinical tool for treatment response prediction is the lack of standardization, ranging from the choice of MRI protocol and ROI strategy to the choice of treatment response criteria and scan timing. Although recommendations are published (Padhani et al 2009) implementation and general consensus is lacking. Other reasons may include limitations in radiotherapy departments' access to clinical MRI scanners, qualified MRI radiographers and physicists. Also the scientific evidence is not entirely consistent, mainly because the number of variables in each clinical study is very large and varies between studies, e.g. tumor histology, radiation dosage, concomitant chemotherapy, MRI sequence, data post processing etc. Consequently, studies are not directly comparable, nor their conclusions (Buyse et al 2010).
In this study the metrics ADC, perfusion fraction and pseudo diffusion coefficient were investigated to evaluate their prognostic value and to propose an optimal time point for DW-MRI scan. It was found that the volume based response evaluation was far more consistently predictable by the ADC change found at fraction number 7 and later, compared to the linear response (RECIST). The perfusion fraction and pseudo diffusion coefficient did not show sufficient prognostic value at either response assessment method. These observations suggest that the ADC change is the only reliable parameter in a standard clinical setting (in a protocol comparable to ours), and should be acquired at fraction number 7 (equivalent to 21 Gy) or later in patients with brain metastases. Enhanced internal perfusion of brain metastases and surrounding edema can cause all DW-MRI derived parameters to become heterogeneous. Especially, since a bi-exponential fit is required to estimate the pseudo diffusion, the estimation becomes particularly sensitive to noise and tissue heterogeneity. Adding more b-values in the range 0-200 s mm −2 as well as higher field strength of the scanner may improve param eter estimation but probably not significantly if tissue complexity has a higher impact. A voxel-by-voxel approach to estimation of the pseudo diffusion (and other IVIM parameters) may resolve the tissue heterogeneity issue but at the expense of the signal-to-noise ratio. Since the signal originating from the pseudo diffusion components in the ROI are lost at high b-values (>200 s mm −2 ) (Le Bihan et al 1989), tissue perfusion heterogeneity does not affect the estimation of ADC which is essentially the averaged water diffusion occurring in the extracellular water compartments.
The use of the RECIST criteria is currently the only standardized method for tumor response assessment. Its use is highly questionable in non-spherical lesions (Cademartiri et al 2008) since a change in the longest trans-axial diameter does not necessarily correlate to a change in tumor size and even less to the change in the number of neoplastic cells. More than a third of the lesions in the present study were irregularly shaped. A closer examination of these lesions revealed poor correlation between volume and longest trans-axial diameter, and in a few cases negative correlation. The response assessment based on volumetric change is clearly a better surrogate for change in number of neoplastic cells and as shown in this study serves as a far more predictable endpoint. Transferring the numerical bounds from RECIST, being a linear measure, to the volumetric measurements is not as obvious as it might appear, since a fractional increase in volume does not have a one-to-one relationship with a linear increase in any direction, but typically varies among the possible directions of growth. Nevertheless, in order to simplify the analysis and with the lack of acknowledged and verified alternatives, we maintained the numerical bounds from RECIST in the volumetric analysis. Better guidelines for assessment of treatment response of irregularly shaped lesions are clearly desired, but volumetric evaluation is not necessarily the answer (Marcus et al 2009). and perfusion fraction ( f p ). This example shows normalized data for a brain metastasis in a 66 years old female breast cancer patient. Measured data is marked by x, dotted line is a mono-exponential fit (root mean squared error (rmse) = 0.0050) using high b-values to determine ADC (high) and f p (1 − f p being the coincidence value with the y-axis). Solid line is a bi-exponential fit (rmse = 0.0055) with fixed values of ADC (high) and f p to determine D p . Estimated values for this patient: ADC (high) = 1.33 · 10 −3 mm 2 s −1 , f p = 0.15, D p = 6.68 · 10 −3 mm 2 s −1 .
The observed increased ADC in responders is consistent with what has been reported by others (e.g. Chenevert et al 1997, Mardor et al 2003, Koh et al 2007a, 2007b, Li et al 2012, Blackledge et al 2014. A single study was found reporting IVIM parameters in a brain metastasis (Federau et al 2014). Increase in all IVIM parameters ( f p , D p , ADC) was   observed in the single scan acquired prior to therapy, hence comparison to our results is difficult. In a study by Kim et al (2014) on IVIM parameters following radio surgery, significant increase was reported in the 10th percentile histogram cutoff for ADC in responding metastases, as compared to recurrent tumor, 6 months after radiosurgery. A few studies also report decreased ADC in responding lesions, explaining it as being caused by radiation induced fibrosis (Hein et al 2003), and a recent study on brain metastases unable to provide an explanation for the decrease (Jakubovic et al 2016). In radiotherapy the earliest increase in ADC has been reported at one week after initiating therapy in brain tumors (Mardor et al 2003), basically in agreement with the present study. Some studies report low pre-treatment (absolute) ADC values to correlate with better response to treatment and high pre-treatment ADC values to predict poor response to treatment (Dzik-Jurasz et al 2002, Mardor et al 2004, Koh et al 2006. One explanation may be that tumors with high pre-treatment ADC values are likely to be more necrotic than those with low values (Koh et al 2007a). However, as the change in ADC is not measured in these studies a comparison to our results is not possible. The observed dip in ADC soon after start of the RT course seen in both responders and non-responders can be due to cell swelling (cytotoxic edema) as this has been suggested as cellular reaction to radiation and other traumatic events (Chenevert et al 2000). At the time of the follow-up scan ADC seems to have normalized, but is also prone to noise in the 'CR and PR' group since the tumor volumes by definition have shrunk or completely gone. For the 'PD and SD' group the variation in ADC may also be large at the time of follow-up, due to heterogeneous tissue reactions (Le Bihan et al 1986), for example edema and increased proliferation rate. The predictive capacity at the time of follow-up is very low for this reason also (AUC about 0.5). AUC data revealed that ADC derived using only two b-values is feasible if the scan time needs to be kept at a minimum. With the MRI protocol used in this study the scan time of the diffusion sequence can be reduced from about 8.5 min (b = 400, 500, 600, 800 s mm −2 ) to about 3.5 min (b = 400, 800 s mm −2 ) and 2.5 min (b = 0, 800 s mm −2 ), respectively. The standard choice is often to use b = 0 and a high b-value (mostly b = 1000 s mm −2 ). Using low b-values has been reported to clearly overestimate the ADC (Kallehauge et al 2010) and also to introduce a higher degree of variation (Mahmood et al 2015). This is probably due to perfusion effects at b-value below about 200 s mm −2 (Le Bihan et al 1989). Hence the recommendation should be that b = 400, 800 s mm −2 is used if only ADC needs to be estimated and used as biomarker for treatment response. If minimizing scan time is not the object of optimization a more reasonable comparison of different sets of b-values requires that the overall scan time is kept constant.
The purpose of this study was to investigate DW-MRI derived metrics to find their potential as early biomarkers for treatment response and especially to address the question of when to acquire the scans. Although this purpose has been reasonably fulfilled important fundamental questions still remain. It is for example unknown whether the ADC change observed in the responder and non-responder groups, respectively, is a pure dose response or time is a confounding factor. For this reason and due to histological differences between brain metastases and other tumor sites the results of this study cannot be applied generally in the clinic or in clinical studies, but it can serve as qualified help in designing future studies. For further investigation of brain metastases we would also suggest stratification for primary disease which however requires larger sample sizes. The aim of our next clinical trial is to include patients receiving stereo tactic radiosurgery (SRS) to investigate the possible effect of time on ADC change. Also based on the present findings an interventional trial will be conducted to study the benefit of adjuvant conformal boost treatment to patients with nonresponding metastases.

Conclusion
This study shows that the ADC derived using high b-values may be a reliable biomarker for early assessment of radiotherapy efficiency for brain metastases patients. The linear RECIST criteria used as discriminator correlate considerably worse with ADC variation compared to a volume based evaluation using the same numerical bounds. The perfusion fraction and pseudo diffusion coefficient do not differentiate between responding and non-responding metastases. Highest prognostic values are achieved using the relative ADC change found at fractions 7-10, suggesting that the earliest response stratification can be achieved using two DW-MRI scans, one pre-treatment and one at treatment day 7-9 (equivalent to 21 Gy).