Apparent diffusion coefficient agreement and reliability using different region of interest methods for the evaluation of head and neck cancer post chemo-radiotherapy

Objectives: Post chemoradiotherapy (CRT) interval changes in apparent diffusion coefficient (ADC) have prognostic value in head and neck squamous cell cancer (HNSCC). The impact of using different region of interest (ROI) methods on interobserver agreement and their ability to reliably detect the changes in the ADC values was assessed. Methods: Following ethical approval, 25 patients (mean age 59.5 years, 21 male) with stage 3–4 HNSCC undergoing CRT were recruited for this prospective cohort study. Diffusion weighted MRI (DW-MRI) was performed pre-treatment and at 6 and 12 weeks following CRT. Two radiologists independently delineated ROIs using whole volume (ROIv), largest area (ROIa) or representative area (ROIr) methods at primary tumour (n = 22) and largest nodal (n = 24) locations and recorded the ADCmean. When no clear focus of increased DWI signal was evident at follow-up, a standardised ROI was placed (non-measurable or NM). Bland-Altman plots and interclass correlation coefficient (ICC) were assessed. Paired t-tests evaluated interval changes in pre- and post-treatment ADCmean at each location, which were compared to the smallest detectable difference (SDD). Results: Excellent agreement was obtained for all ROI methods at pre-treatment (ICC 0.94–0.98) and 6-week post-treatment (ICC 0.94–0.98). At 12-week post-treatment, agreement was excellent (ICC 0.91–0.94) apart from ROIr (ICC 0.86) and the NM nodal disease (ICC 0.87). There were significant interval increases in ADCmean between pre-treatment and post-treatment studies, which were greater than the SDD for all ROIs. Conclusions: ADCmean values can be reproducibly obtained in HNSCC using the different ROI techniques on pre- and post-CRT MRI, and this reliably detects the interval changes.


Introduction
Advanced head and neck squamous cell carcinoma (HNSCC) is treated by chemoradiation therapy (CRT) at most tumour sites; however, residual disease is still seen at locoregional sites in more than 25% of patients. 1 Unfortunately, this can be difficult to detect clinically, with post-treatment scarring potentially masking sites of active disease, reducing the window of opportunity for curative surgical treatment. Metabolic imaging with the earlier detection of residual disease. An alternative imaging approach is with diffusion-weighted magnetic resonance imaging (DW-MRI), which may combine a qualitative analysis with a quantitative measure of the apparent diffusion coefficient (ADC) for the identification of cellular tumour. In particular, a change in ADC between pre-treatment and post-treatment MRI studies has been proposed as a biomarker for treatment response, with a reduction in cellularity and progressive necrosis resulting in increased ADC values when treatment is successful. [2][3][4] There are differing methods used for the acquisition of ADC values with regions of interest (ROIs) being used to sample the whole volume, a maximum crosssectional area or an area excluding necrosis which is representative of "viable" tumour. 5 In order for changes in ADC values to be applied to the detection of residual tumour, it must be ensured that the measurements are reproducible and that any variation in ADC values between different observers is less than the interval changes between pre-treatment and post-treatment ADC values. This may be particularly relevant when post-treatment tumour regression obscures the target for ROI placement on the posttreatment MRI sequences.
We aimed to measure the interobserver agreement of ADC mean values recorded at primary tumour and largest pathological nodal locations on the pre-treatment as well as 6-week and 12-week post-CRT DW-MRI in patients with Stage 3-4 HNSCC. Furthermore, the interobserver agreement was calculated for three different methods of obtaining ROIs and the reliability of the ADC mean measurements was assessed for its ability to detect the interval changes in the ADC values.

Patient selection and details
Adult patients were eligible if they had proven Stage 3 or 4 primary HNSCC and a ≥ 1 cm 2 measurable area of disease at the primary or nodal site, and in whom curative primary chemoradiotherapy or radiotherapy (RT) alone was planned. Patients were recruited between May 2014 and May 2015, and staging was performed according to the seventh edition TNM classification for head and neck cancer. Diagnostic biopsies were obtained from the primary tumour (n = 21), lymph node (n = 3) or both sites (n = 1). Patients were excluded if they had prior chemotherapy or RT or evidence of distant metastatic disease.

MRI
Patients underwent DW-MRI on a 1.5 Tesla Siemens Magnetom Aera system (Siemens Medical Systems GmbH, Erlangen, Germany). Axial echo planar DW-MRI was acquired with multiple b-values (0, 50, 100, 800 and 1500 s/mm 2 ) and TR 5900 ms, TE 60 ms, two signal averages, FOV 240 × 240 mm, slice thickness 4 mm with a 4 mm slice gap. Mono-exponential ADC maps were calculated from the b = 100 and b = 800 values.

ROI delineation
Two independent radiologists (3 and 7 years experience) delineated ROIs using Osirix v.8.0.2 software on the DWI b = 800 s/mm 2 map, but with access to the post gadolinium fat-saturated T1 axial sequence. ADC mean values were recorded at primary tumour and the largest pathological lymph node locations according to areas demonstrating DWI hyperintensity. By a priori definition, the largest pathological node needed to be >1 cm 2 in order to be considered as measurable disease. If not clearly pathological on imaging criteria (>1 cm short axis/necrosis/extranodal involvement) then they would have undergone FNA to confirm as pathological. Any areas of necrosis were defined by cross-referencing to areas of either high signal on the b = 0 map or absence of gadolinium enhancement.
Three freehand separate regions of interest (ROIs) were placed individually within measurable primary tumour and/or largest lymph node on the baseline pretreatment images: (1) A volumetric ROI placed on multiple sections to encompass the whole of the primary tumour/largest lymph node (ROI v , Figures 1a-h and 2a-g).

Figure 1
Volumetric ROI (ROI v ) of the largest lymph node on the b800 image disease was manually delineated (a-h) on the b800 DWI sequence. ROI a was analysed on slice e. i is a magnified view of the node in panel e with an example of a representative ROI (ROI r ). This was focused on an area of increased DWI signal on the b800 image but excluded any areas of necrosis.
(2) An area ROI placed around the maximum crosssectional area of the primary tumour/largest lymph node (ROI a , Figures 1e and 2d). (3) A representative area ROI placed on the maximum cross-sectional area of the primary tumour/ largest lymph node in the core of the lesion (ROI r , Figures 1i and 2h). This ROI was focused on an area of increased DWI signal on the b = 800 s/mm 2 map but excluded any areas of necrosis defined by crossreferencing to areas of high signal on b = 0 map or the gadolinium enhanced images.
The ROIs were then transferred to the corresponding ADC maps generated from the monoexponential b = 100 and b = 800 DWIs. If there was no longer a focus of increased DWI signal on the post-treatment studies to target, a standardised 6-mm diameter circular ROI was placed at its original location ( Figure 2i) and this was termed nonmeasurable (NM) disease. In these cases, it was not possible to perform ROI v and ROI a analysis.
As a reference to assess for measurement stability across timepoints, an ovoid ROI was also placed by one radiologist within the spinal cord at the level of the C2 vertebral body and the ADC mean was recorded.

Statistical analysis
Interobserver agreement in ADC mean measurements was assessed for pre-treatment, 6-week post-treatment and 12-week post-treatment ROIs using the Bland-Altmann method, as implemented in the Statistics Toolbox in Matlab R2018a (The MathWorks Inc, Natick, Massachusetts, USA). After assessing for data normality using Kolmorogov-Smirnov tests, paired Student's t-tests were performed. Separate analysis was performed for the post-treatment imaging when termed non-measurable (NM) disease was present. The interobserver intraclass correlation coefficient (ICC) was calculated.
The mean change in ADC between each of the timepoints using all three ROI methodologies was compared with the smallest detectable difference (SDD). This is the smallest difference between interval ADC values that can be interpreted to be "real" when accounting for the potential interobserver variation (based on the agreement statistics). It is, therefore, a conceptually useful result since interval changes larger than the SDD should not simply result from interobserver variation in the measures. The significance level, α, was set to 0.05 and the statistical power, 1-β, to 0.9.
The spinal cord ADC mean data were normally distributed and were compared between different time points with a paired Student's t-test.

Results
25 patients (21 male, four female) were recruited with a mean age of 59.5 (range 44.4-73.9) years. There were 18 oropharyngeal (17 HPV positive), four hypopharyngeal and three laryngeal (stage III n = 4, stage IVa n = 21) tumours. ROIs were delineated at primary tumour (n = 22) or largest nodal (n = 24) locations. In three of the oropharyngeal (tonsillar) tumours, a primary lesion could not be reliably delineated and one patient with a tongue base tumour had no pre-treatment nodal disease.
At 6 weeks, 20 of 22 (20/22) primary tumours and 10/24 largest lymph nodes were non-measurable. At 12 weeks, 22/22 primary tumours and 19/24 largest lymph nodes were non-measurable. Image artefact precluded accurate ADC mean measurement from ROIs placed at the site of NM primary and nodal disease in one case at 12-week post-treatment, which was excluded from subsequent comparisons. In all comparisons where n ≥ 5 in each group, ADC mean values were found to be normally distributed.
There was excellent interobserver agreement using all three ROI methods for the pre-treatment evaluation of ADC mean values at nodal and primary tumour locations across all three ROI methodologies (ICC 0.94-0.98) ( Table 1 and Figure 3).
At 6-week post-treatment, there was excellent agreement for the assessment of the 14/24 cases of measurable and the 10/24 cases of NM nodal disease (ICC 0.95-0.98) as well as the 20/22 cases of NM primary Volumetric ROI (ROI v ) of a primary HNSCC, and a standardised ROI placed at the site of non-measurable (NM) posttreatment disease. A primary left-sided oropharyngeal tumour is manually delineated on the b800 image (a-g). ROI a was recorded from the ROI transferred from slice d. h is a magnified view of the tumour in slice d with an example of a representative area ROI (ROI r ). There was no measurable (NM) disease at the site of the original tumour at 12-week post-treatment (i) so a standardised 6-mm diameter circular ROI (circle) was placed at its original location. tumour (ICC 0.94) ( Table 1 and Supplementary Figure  1).
At 12-week post-treatment, there was excellent interobserver agreement for the ROI v (ICC 0.91) and ROI a (ICC 0.93) analysis of the 5/24 cases of measurable nodal disease, and for the evaluation of the 21/22 NM cases of primary tumour. There was good interobserver agreement for the ROI r (ICC 0.86) analysis of the 5/24 cases of measurable nodal disease and the 18/24 cases of NM (ICC 0.87) nodal disease at 12-week post-treatment (Table 1 and Supplementary Figure 2).
There was no significant difference in values recorded at the site of primary tumour or the largest node between observers using any of the three ROI techniques at any timepoint (paired Student's t-test p-value > 0.10 in all cases).
There was a statistically significant post-treatment increase in ADC mean at both primary and nodal sites from pre-treatment to both 6-week and 12-week posttreatment DW-MRI studies irrespective of the ROI method with a paired/group Student's t-test p-value < 0.002 in all cases (Table 2 and Supplementary Figure 3). These interval increases in ADC mean were larger than the SDD for all ROI methods ( Table 2). The ADC mean only increased further between 6-and 12-week post-treatment at sites of NM nodal disease where the interval change also exceeded the SDD.
The mean difference in ADC was significantly larger than the SDD for all ROI methodologies between baseline compared to subsequent timepoints. Between 6and 12-week post-treatment, only ROIs placed at the previous site of primary nodal but now NM disease showed a larger difference in ADC than the SDD.
The spinal cord ADC was normally distributed at each timepoint and there was no significant difference in cord ADC mean between timepoints (baseline vs 6-week post-treatment p = 0.80, baseline vs 12-week posttreatment p = 0.07, 6-vs 12weeks post-treatment p = 0.07).

Discussion
Our findings indicate excellent interobserver agreement using all three ROI methods for the pre-treatment and 6-week post-treatment evaluation of ADC mean values at nodal and primary tumour locations. The 12-week post-treatment interobserver agreement was also Table 1 Interobserver comparison of ADC mean measurements in ROIs placed within the largest node or primary tumour at pre-treatment and 6-and 12-week post-treatment DW-MRI  excellent apart from measurable ROI r and NM nodal disease.There were significant changes in ADC mean between 0-6 week and 0-12 week studies, and this was of a magnitude greater than the SDD calculated from the interobserver variation for all ROIs analysed. Successful CRT treatment results in an increase in HNSCC ADC and when there is a reduction in the expected interval rise in ADC, this is an association with locoregional treatment failure. [2][3][4] Such interval changes in tumoural ADC or other diffusion MRI parameters values have been evaluated intratreatment and post-treatment as a prognostic guide. [2][3][4][6][7][8] The majority of studies have found that an increased absolute or ADC mean or a greater rise in ADC mean from pretreatment to either intratreatment or post-treatment studies, although the finding is not universal. High pre-treatment ADC and low rise in early intratreatment or post-treatment ADC with CRT were consistently observed to be indicators of locoregional failure in the systematic review by Chung et al 4 , although studies were few and heterogeneous. King et al 2 found that post-treatment ADC showed a significant correlation with locoregional failure at 6 months, with ROC curves determining that an ADC of a post-treatment mass of ≤1.4 ×10 −3 mm 2 /s achieved 100% specificity and 45% sensitivity for locoregional failure. The optimal threshold of the change in ADC prior to and three weeks following the end of CRT for differentiating responding from non-responding primary HNSCC lesions was found in one paper to be 25%. 3 In another study, a significant increase in ADC was seen in complete responders within 1 week of CRT treatment for HNSCC, which remained high until the end of the treatment. 6 This group showed significantly higher increase in ADC than the partial responders by the first week of CRT.

Pre-treatment ADC measurements
The reproducibility of ADC measurements may vary depending on the technique used to select the ROIs and this impacts on the ability of ADC measurements to detect differences between serial posttreatment studies. Previous data on pre-treatment interobserver agreement in the assessment of ADC values for HNSCC have demonstrated ICC statistics ranging between 0.79 and 0.96. [9][10][11] This is potentially influenced by choice of sequence used to define the ROIs, scan parameters, image distortion, choice of nodal versus primary tumour location and lesion size. Our study revealed excellent agreement on pretreatment evaluation of ADC using all ROI methods at both nodal and primary tumour locations.
To our knowledge, interobserver agreement in ADC measurements following treatment for HNSCC has not been previously assessed; however, this is critical if interval changes are to be used in the evaluation of Figure 3 Bland-Altmann plots of ADC mean measurements for ROIs placed within the largest node and the primary tumour on pre-treatment DW-MRI. The horizontal solid line on each plot represents the mean difference in recorded ADC between the two observers, with the hatched lines delineating LoA. Excellent interobserver agreement in ADC mean measurement at baseline at both the site of the primary tumour and within the largest involved lymph node observed across all three ROI methodologies. ICC, Intraclass correlation coefficient; LoA, Limits of agreement. treatment response. The post-treatment setting creates the added challenge of defining ROIs after regression of a mass lesion, when there may no longer be a clear focal target for the ROI. 12 When there is no residual post-treatment tumour, it has been described how the ADC may be analysed according to the site of the pretreatment lesion in studies at other tumour sites. 13 Our results indicated that such NM disease could also be measured with good or excellent interobserver agreement and that the reliability was sufficient to detect the interval changes in ADC values from pre-treatment to 6-and 12-week post-treatment studies.
The impact on reproducibility of using different ROI methods for the assessment of ADC mean has only been applied to thyroid nodules in the head and neck. There was found to be excellent agreement for malignant thyroid nodules using ROI a and ROI v but only fair agreement for ROI r . 14 This concurs with studies at other tumour sites which have demonstrated decreased interobserver agreement when using smaller ROIs. [15][16][17] Our data also indicated the potential for reduction in interobserver agreement with ROI r placed on 12-week post-treatment lymph nodes. This may relate to the additional interpretation required for placing the ROI in an area of solid viable tumour. Rather than assessment based on ADC mean values, Ren et al. investigated the influence of different ROI selection methods on the histogram analysis of ADC maps of locally advanced HNSCC and found decreased agreement when analysing ROI a (ICC 0.51-0.85) compared to ROI v (0.77-0.96). 18 Whilst our results did not reveal any reduction in the agreement with ROI a , it could conceivably introduce bias due to the subjective nature of selecting the appropriate section, which is eliminated by sampling the whole lesion with ROI v .
The major limitation of our study is the small sample size and the few patients demonstrating measurable disease following treatment. There were only 5/23 measurable lymph nodes and 0/22 measurable primary tumour at 12-week post-treatment, which precluded adequate assessment of reproducibility in measurable disease post-treatment. This is likely to be due to the preponderance of HPV related oropharyngeal cancer in our study group. This epidemiologically, histologically and clinically distinct form of HNSCC has a well-documented favourable response to CRT. Second, the comparisons were performed exclusively for the mean ADC values and other ADC parameters were not assessed. Although mean ADC values are the most frequent ADC values to be applied clinically, minimum ADC values have also been used for the evaluation of HNSCC. Finally, it is appreciated that there Table 2 Comparison of the mean change in ADC mean between pre-treatment to 6-week post-treatment, pre-treatment to 12-week post-treatment, and 6-week post-treatment to 12-week post-treatment Paired/Group Student's t-tests compared the mean change in ADC mean between the three timepoints using the three different ROI methodologies. Where these changes were statistically significant and larger than the smallest detectible difference (SDD), which accounted for interobserver variation in the measures, the p-values are highlighted in bold.

ADC change between pre-treatment and 6-week post-treatment
are other factors such as the sequence parameters, the choice and number of b values, the variation in field homogeneity, coil selection between MRI systems 19 and the postprocessing method which impact on the standardisation and reliability of ADC measurements in HNSCC. All other confounding factors on ADC estimates were set constant in this study in order to determine the exact effect of ROI method and posttreatment effects; however, our results cannot necessarily be translated into other clinical scenarios where they may differ.

Conclusion
ADC mean values at existing or previous sites of HNSCC can be reproducibly obtained using all three different ROI techniques, even when there is no increased DWI signal as a target. A significant rise in the ADC mean was detected between pre-treatment and either 6-week or 12-week post-treatment studies, and this change was of a greater magnitude than the SDD in measured ADC mean values. This provides further evidence that ADC may be suitable as an imaging biomarker to assess for treatment response between pre-treatment and 6-or 12-week post-treatment MRI studies.