Evaluation of diffusion-weighted MRI and (18F) fluorothymidine-PET biomarkers for early response assessment in patients with operable non-small cell lung cancer treated with neoadjuvant chemotherapy

Objective: To correlate changes in the apparent diffusion coefficient (ADC) from diffusion-weighted (DW)-MRI and standardised uptake value (SUV) from fluorothymidine (18FLT)-PET/CT with histopathological estimates of response in patients with non-small cell lung cancer (NSCLC) treated with neoadjuvant chemotherapy and track longitudinal changes in these biomarkers in a multicentre, multivendor setting. Methods: 14 patients with operable NSCLC recruited to a prospective, multicentre imaging trial (EORTC-1217) were treated with platinum-based neoadjuvant chemotherapy. 13 patients had DW-MRI and FLT-PET/CT at baseline (10 had both), 12 were re-imaged at Day 14 (eight dual-modality) and nine after completing chemotherapy, immediately before surgery (six dual-modality). Surgical specimens (haematoxylin-eosin and Ki67 stained) estimated the percentage of residual viable tumour/necrosis and proliferation index. Results: Despite the small numbers,significant findings were possible. ADCmedian increased (p < 0.001) and SUVmean decreased (p < 0.001) significantly between baseline and Day 14; changes between Day 14 and surgery were less marked. All responding tumours (>30% reduction in unidimensional measurement pre-surgery), showed an increase at Day 14 in ADC75th centile and reduction in total lesion proliferation (SUVmean x proliferative volume) greater than established measurement variability. Change in imaging biomarkers did not correlate with histological response (residual viable tumour, necrosis). Conclusion: Changes in ADC and FLT-SUV following neoadjuvant chemotherapy in NSCLC were measurable by Day 14 and preceded changes in unidimensional size but did not correlate with histopathological response. However, the magnitude of the changes and their utility in predicting (non-) response (tumour size/clinical outcome) remains to be established. Advances in knowledge: During treatment, ADC increase precedes size reductions, but does not reflect histopathological necrosis.

inTrODucTiOn Non-small cell lung cancer (NSCLC) stage II and occasionally IIIA is routinely managed with surgery and adjuvant chemotherapy. 1 Persistently poor survival rates 2 have driven the use of induction (neoadjuvant) chemotherapy to reduce pre-surgical tumour burden 3-5 ; results are favourable and equivalent to those where adjuvant chemotherapy is used. 6,7 During neoadjuvant chemotherapy, response assessment relies on size-based (RECIST) measurements, 8 which often only decrease 2-3 months after treatment initiation. 9 Moreover, improved https:// doi. org / 10. 1259/ bjro. 20190029 Objective: To correlate changes in the apparent diffusion coefficient (ADC) from diffusion-weighted (DW)-MRI and standardised uptake value (SUV) from fluorothymidine ( 18 FLT)-PET/CT with histopathological estimates of response in patients with non-small cell lung cancer (NSCLC) treated with neoadjuvant chemotherapy and track longitudinal changes in these biomarkers in a multicentre, multivendor setting. Methods: 14 patients with operable NSCLC recruited to a prospective, multicentre imaging trial (EORTC-1217) were treated with platinum-based neoadjuvant chemotherapy. 13 patients had DW-MRI and FLT-PET/CT at baseline (10 had both), 12 were re-imaged at Day 14 (eight dual-modality) and nine after completing chemotherapy, immediately before surgery (six dual-modality). Surgical specimens (haematoxylin-eosin and Ki67 stained) estimated the percentage of residual viable tumour/necrosis and proliferation index. results: Despite the small numbers,significant findings were possible. ADC median increased (p < 0.001) and SUV mean decreased (p < 0.001) significantly between baseline and Day 14; changes between Day 14 and surgery were less marked. All responding tumours (>30% reduction in unidimensional measurement pre-surgery), showed an increase at Day 14 in ADC75 th centile and reduction in total lesion proliferation (SUV mean x proliferative volume) greater than established measurement variability. Change in imaging biomarkers did not correlate with histological response (residual viable tumour, necrosis). conclusion: Changes in ADC and FLT-SUV following neoadjuvant chemotherapy in NSCLC were measurable by Day 14 and preceded changes in unidimensional size but did not correlate with histopathological response. However, the magnitude of the changes and their utility in predicting (non-) response (tumour size/clinical outcome) remains to be established. advances in knowledge: During treatment, ADC increase precedes size reductions, but does not reflect histopathological necrosis.
survival may be seen without a reduction in tumour size because of decreased metastatic propensity with cytostasis and post-treatment oedema. 9,10 Therefore, early indicators of treatment response sensitive to biological changes in the tumour are desirable. Imaging techniques, such as diffusion-weighted MRI (DW-MRI) and PET, probe the tumour microenvironment and provide this potential.
DW-MRI provides an apparent diffusion coefficient (ADC), a biomarker of cellularity, 11 which has been shown to increase in many cancers on treatment due to cell death. 12,13 (18F)-fluorothymidine (FLT) uptake on PET, is also a promising biomarker for treatment response evaluation since its uptake is related to tumour cell proliferation. 14,15 Some observational studies indicate that early reduction in FLT uptake correlates with later changes in tumour size [16][17][18] others do not. 19 This study was part of a larger programme validating imaging biomarkers for early response assessment in drug development. 20,21 Neither ADC nor FLT uptake have been jointly evaluated with reference to histopathology in NSCLC: we hypothesised that they were related to pathological indices of response (ADC to necrosis and FLT Standardised Uptake Value (SUV) to proliferation) and correlated their changes in with histological measures of response (necrosis) and non-response (residual proliferative activity). We also studied longitudinal patterns of ADC and FLT-SUV change and linked them to RECIST-based size criteria for response and non-response. This international multicentre study was performed under quality-controlled conditions for multimodality, quantitative imaging 22,23 to ensure comparable measurements on multiple scanners at multiple sites.
Patients received platinum-based neoadjuvant chemotherapy without pemetrexed. They had DW-MRI and FLT-PET/CT between 1.10.2015 and 06.04.2017 at three time-points: i) baseline, before chemotherapy; ii) 14 days post-treatment and iii) within 1 week of surgery, 3-5 weeks after completing chemotherapy. Figure 1 summarises the study design.

MRI
Patients were scanned on 4 clinical 1.5T MR platforms [Optima (GE Healthcare, Waukesha, WI); Avanto (Siemens AG, Erlangen, Germany); 2 Achieva (Philips Healthcare, Best, The Netherlands)] using phased-array body coils. All centres performed regular test-object scans for quality assurance (QA)/quality control (QC) and had previously participated in a technical validation phase to establish measurement reproducibility when performed in accordance with prescribed QA procedures. 24,25  Axial T 1 weighted turbo spin-echo in breath-hold and three-dimensional T 2 weighted turbo spin-echo with variable flip-angle images were obtained as per local protocols.
DW-MRI was acquired during free-breathing using a singleshot echo-planar technique with short-tau inversion recovery fat suppression. Two blocks of 30 slices were acquired through the chest (thickness 5 mm, no gap), field-of-view 320 × 280 mm, four signal averages or four acquisitions, 3 b-values (100, 500, 800 s/ mm 2 ).

Flt PET/CT
FLT was synthesised either in-house (Humanitas Clinical and Research Center, and VUMC) or by an external supplier (PETNET solutions Incorporated, Nottingham, UK). PET/CT was performed on a Discovery 690 (General Electric Healthcare, Waukesha, WI), Siemens Biograph 128m and Biograph 64 or Philips Gemini TF TOF 64 scanner in patients fasted for 4 h. Images were acquired without respiratory gating in a single bed position 60 min after injection of a 4MBq/kg FLT bolus. A non-contrast, non-breathhold CT scan was also obtained. PET/ CT images were corrected for attenuation using low-dose CT scans according to EANM guidelines. 26 Scanner accreditation through the EARL initiative (EANM Research Ltd; earl. eanm. org) guaranteed measurements' reproducibility and comparability, across patient's scans and among centers.
iMaging BiOMarKers analYsis Each patient was imaged using the same scanner across all visits. Central review of images was performed. EARL approved reconstruction was used for PET/CT image analysis. ADC and SUV were calculated at baseline (ADC 1 , SUV 1 ), Day 14 (ADC 2 , SUV 2 ), and pre-surgery (ADC 3 , SUV 3 ).
For DW-MRI, ROIs were manually segmented on computed b = 800 s/mm 2 images with anatomical reference to T1-W, T2-W, ADC and b = 100 s/mm 2 images on all slices with identifiable tumour. ROIs were copied onto corresponding ADC maps in Osirix. Whole tumour ADC histograms were generated. The median ADC (ADC median , primary endpoint), the 25 th and 75 th centiles and interquartile range (IQR) were recorded at each time-point. DWI-derived lesion volume at each time-point was estimated by multiplying total number of voxels by voxel dimensions.
All lesions were delineated using a 50% isocontour of the SUV peak corrected for local background to derive tumour volumes of interest (VOI) designated as the proliferative volume. SUV was calculated within the VOI. SUV mean (primary endpoint), SUV max , SUV peak and total lesion proliferation (TLP, product of SUV mean and proliferative volume) were recorded.

TuMOur resPOnse assessMenT
Percentage change of imaging biomarkers at Day 14 (∆ADC early , ∆FLT early ) and pre-surgery (∆ADC post , ∆FLT post ) in comparison to baseline values were calculated, where ∆ADC early = 100%x(ADC 2 -ADC 1 )/ADC 1 , ∆SUV early = 100%x(SUV 2 -SUV 1 )/ SUV 1 , ∆ADC post = 100%x(ADC 3 -ADC 1 )/ADC 1 and ∆SUV post = 100%x(SUV 3 -SUV 1 )/SUV 1 Response between baseline and surgery was evaluated on CT images using longest uni-dimensional axis of the primary lesion defined by RECIST1.1 criteria (reduction in target lesion diameter >30%). 8 Patients with appearance of new lesions who did not have further imaging were also classed as non-responders. The DWI and FLT images were not used to re-stage the patients, as this was outside the primary aim.
hisTOPaThOlOgY Tumours were assessed on haematoxylin-eosin (HE) stained slides from surgical resection specimens by a central review pathologist (μ). Tumour surface area (mm 2 ) was calculated on a maximum of 10 representative slides per lesion, along with areas of viable tumour, necrosis and fibrosis. Total tumour surface area was calculated by summing values from all slides; total viable tumour surface area and total necrotic tissue area were obtained similarly. The percentage of viable tumour or necrosis for each lesion was estimated from 100%x (total viable tumour surface area) or (total necrotic tissue surface area) respectively/(total tumour surface area). Tumour cell proliferation was assessed by immunohistochemistry (Ki67 staining). 27 Positive nuclei were calculated on each slide in at least 100 cells in the representative areas of the tumour and expressed as a percentage. The mean Ki67 index from all reviewed slides was the final sample score.

sTaTisTical cOnsiDeraTiOns
Co-primary imaging endpoints were ΔADC early defined on ADC median and ΔSUV early defined on SUV mean ; alternative ADC and SUV measures were secondary endpoints. The primary pathology endpoint measures were percentage of viable tumour cells; percentage of necrosis and proliferative activity (Ki67 index) were secondary endpoints. Responder status was based on unidimensional size and included patients who did not undergo surgery. 31 lesions were needed to demonstrate with 95% confidence (one-sided) with 90% power that the absolute correlation between the imaging biomarker change and the pathological response was >0.5 (H0: rho ≤0.5) if the true correlation is 0.8 (H1: rho >0.5).
Correlations between continuous endpoints were assessed using the Spearman rank correlation test with Fisher transformation and assumed to be positive between ΔSUV early and %viable tumour (ρ 0 = 0.5), and negative between ΔADC early and %viable tumour (ρ 0 = −0.5). Equivalent two-sided 90% confidence intervals are reported to check futility.
The one-sided type I error was fixed at 1% for all secondary and exploratory analyses. When relevant, equivalent two-sided 98% confidence intervals are reported.
Longitudinal comparisons of ADC, SUV and volumes were analysed with a two-sided Wilcoxon signed-rank test.

resulTs
14 patients were successfully screened and 13 finally participated (one withdrew). 13 patients had DW-MRI and 16 FLT-PET/CT at baseline (10 had both), 12 were re-imaged at Day 14 (eight dual-modality) and nine after completing chemotherapy, immediately before surgery (six dual-modality). four patients did not have surgery, three due to progressive disease and one due to toxicity. An overview of imaging scans and surgery status is presented in Table 2.
Of the 12 patients imaged at Day 14, three met the criteria for response of the primary lesion (30% reduction in unidimensional measurement) on CT performed prior to surgery (Table 3). Eight patients were classed as non-responders; the numbers of patients imaged at each time-point with each imaging modality are given in Table 4. The remaining patient was not included in the response assessment because toxicity to Carboplatin plus Vinorelbine prompted a change to the targeted agent Erlotinib.
Re-staging the patients with either DWI or FLT was outside the remit of the study. Also repeat mediastinoscopy was not routinely performed. Repeat CT scan pre-surgery detected a new lesion in one patient of our cohort, who was therefore classified as a non-responder.

cOrrelaTiOn OF iMaging ParaMeTers anD PaThOlOgical Measures
Early changes in imaging parameters were compared with pathological measures (Figure 2). In patients who underwent surgery and had imaging at Day 14 (n = 8), there was no meaningful  Table 4. The relative changes in imaging metrics are presented in Table 5 for DWI and Table 6 for FLT. The ADC median increased (p < 0.001) and SUV mean decreased (p < 0.001) significantly between baseline and Day 14, however the change between Day 14 and surgery was less marked ( Table 5), indicating that this parameter changes early.
The changes in ADC median , ADC 75th and ADC IQR at Day 14 relative to baseline were generally larger in responders than non-responders (Figure 3a-c) although small numbers precluded statistical evaluation. Early changes in ADC 25th , however, were comparable between responders and non-responders.
The decreases in SUV mean, SUV max , SUV peak and TLP at Day 14 relative to baseline were generally larger in responders than non-responders (Figure 3d-f); again small numbers precluded statistical evaluation.

changes WiTh resPecT TO MeasureMenT rePeaTaBiliTY
Changes in imaging biomarkers at Day 14 were also assessed with reference to previously established test-retest estimates of measurement repeatability in order to exclude changes that represent measurement variability. 24,25 (Figure 4). Increases in ADC median greater than the limits of measurement repeatability were seen in four patients (2/3 responders, 2/7 non-responders). The measurable increases in ADC 75th were seen on all three responders but only in 1/7 non-responders, while ADC 25th only increased in 1 responder and two non-responders. Measurable decreases at Day 14 were observed for SUV mean in seven patients (all 3 responders and 4/6 non-responders) SUV max (2/3 responders, 5/6 non-responders) and in six patients for SUV peak (2/3 responders, 4/6 non-responders), and TLP (all 3 responders and 3/6 non-responders).

cOMParisOn OF iMaging vOluMes
Tumour volumes generally decreased between baseline and Day 14 for all imaging modalities, with mean ± standard deviation   Figure 5.
There was no significant correlation between the ∆ADC median at Day 14 and the DWI-assessed tumour volume change measured at Day 14 (r = −0.39, one-sided p = 0.68) or prior to surgery (r =

BJR|Open
Original research: Functional imaging biomarkers of response in operable lung cancer −0.14, one-sided p = 0.93). There was also no significant correlation between the ∆SUV mean at Day 14 and the FLT tumour volume change measured at Day 14 (r = −0.08, one-sided p = 0.95) or prior to surgery (r = 0.25, one-sided p = 0.75).

DiscussiOn
This pilot study showed no significant association between changes in imaging biomarkers (ADC and FLT-SUV) and pathological measures of response.. Despite the small numbers, this is a robust finding because even the confidence intervals did not reach the correlations originally hypothesised. Therefore, it is unlikely that these imaging biomarkers relate to the pathological features as quantified here. It is also unlikely that we would have seen an association between the imaging parameters and histopathology had the planned sample size been achieved. The original study design did not include a futility analysis, but had a futility analysis been included, we could not have ethically justified continued recruitment in light of the poor response to neoadjuvant chemotherapy in these patients. Our primary pathological measure was the percentage of viable residual tumour, which may not have been the best histopathological endpoint, as pathological evaluation was semi-quantitative. Tumour cellularity derived from digital pathology analyses is a more robust measure that could be considered for future validation of imaging biomarkers on histopathology. Secondary analysis found a positive association between pre-surgical FLT-SUV and Ki-67 in our data, however the literature on the relationship between these two parameters is ambiguous. Positive correlations have been found between the two in gliomas, 28 breast cancer 29 and a mixture of lung nodules, 15 but a number of studies in a variety of tumour types have found no relationship. [30][31][32] Our exploratory analysis suggests that changes in imaging biomarkers (measured in a multicentre, multivendor setting against established reproducibility criteria) occur before unidimensional tumour size changes of >30%. ADC generally increased over time, which is linked to an increase in necrotic and apoptotic cell death induced by treatment. 33 Consistent with previous studies, patients who responded to treatment in our sample generally had greater ADC median increases at Day 14 relative to baseline, but statistical analysis was not relevant given the low sample size and small numbers of responders. 34,35 In other NSCLC data, early (after 1 cycle of chemotherapy) increases in ADC have been associated with increased progression-free and overall survival 34 and increased tumour volume reduction. 35 In our data, responders also had larger early increases in ADC 75th than non-responders, which could reflect an increase in necrotic domains following treatment. Although relatively unexplored in NSCLC, changes in ADC histogram parameters have been associated with improved response in other tumour types. [36][37][38] FLT-SUV generally decreased by Day 14 but changed inconsistently thereafter. Responding patients had amongst the highest FLT-SUV values at baseline, which is expected as chemotherapy targets proliferative cells. They also had greater early decreases in SUV mean and TLP; again the small numbers precluded statistical analysis. A previous study (n = 9) assessing NSCLC treatment response using FLT-PET 19 found that FLT parameters did not distinguish between responders and non-responders after 1 cycle of chemotherapy, so that the use of FLT-PET to indicate response remains debatable.
The use of test-retest metrics to establish measurement variability is critical, particularly in a multicentre setting where multiple scanner platforms add variability. Repeatability, ("closeness of the agreement between the results of successive measurements of the same measurand carried out under the same conditions of measurement"), can be represented by limits-of-agreement interval. 39 Our previous test-re-test data from multiple centres including those in this study in larger series for ADC 24 and FLT, 25 gave repeatabilities of 22 and 25% respectively. Therefore, we are confident that the changes measured here exceed measurement variability. For some outcomes (such as ADC 75th ), changes were typically observed more often for patients classified as responders. Importantly, changes in imaging biomarkers preceded volumetric changes. If future research confirmed a strong association between (absence of) early change in the biomarker that remained within measurement variability, then these biomarkers could be used for early detection of (in)effective treatment.
NSCLC responded poorly to platinum-based neoadjuvant chemotherapy (four patients became inoperable). Unfortunately, response assessment based on size is confounded by peritumoral atelectasis and inflammation as these regions are often indistinguishable from residual tumour on CT. This is more problematic when segmenting volumes: CT volumes increased with treatment in two of the three patients classified as responders, likely caused by the inclusion of inflammatory tissue.
The main limitations of this study were firstly, the stringent inclusion criteria that required patients to be operable at the outset. The use of neoadjuvant chemotherapy in these cases represented a departure from standard-of-care and severely limited recruitment. Recruitment also was severely affected by 3 cases of disease progression on chemotherapy. Secondly, although the preclinical literature indicates that ADC is a biomarker of necrosis/apoptosis, in a clinical setting histopathological analysis is influenced by technical aspects and observer interpretation, so that the data is less reliable. 40 Correlation of imaging biomarkers with histology is difficult at a whole lesion level because in large tumour specimens as here, selected sections may not be representative of the entire lesion. Image analysis of whole digitised pathology specimens would address these issues in future. Thirdly, progressive disease due to new metastases not increase in primary tumour size, confounded response classification in two cases. Finally, two patients withdrew consent for MRI either during or following MRI examination. The tolerability of multiple imaging examinations in patient groups with poor performance status and compromised respiratory function is an important consideration when planning future studies.
In conclusion, this study adds to the body of evidence documenting longitudinal changes in imaging biomarkers and indicates their lack of correlation with traditional histological markers of response and non-response. Changes in ADC and FLT following neoadjuvant chemotherapy in NSCLC occur as early as 14 days after initiating treatment and exceed measurement variability in responders. However, the utility of early changes in imaging biomarkers as well as the baseline biomarker levels in predicting (non-) response as defined by clinical outcome or tumour size remains to be established.

cOnFlicT OF inTeresT
John Waterton has received compensation from Bioxydyn Ltd, a for-profit company engaged in the development and provision of imaging biomarker services. All other authors of this manuscript declare no relationships with any companies, whose products or services may be related to the subject matter of the article.

inFOrMeD cOnsenT
Written informed consent was obtained from all subjects (patients) in this study.