Test-retest repeatability of ADC in prostate using the multi b-Value VERDICT acquisition

Purpose: VERDICT (Vascular, Extracellular, Restricted Diffusion for Cytometry in Tumours) MRI is a multi b-value, variable diffusion time DWI sequence that allows generation of ADC maps from different b-value and diffusion time combinations. The aim was to assess precision of prostate ADC measurements from varying b-value combinations using VERDICT and determine which protocol provides the most repeatable ADC. Materials and Methods: Forty-one men (median age: 67.7 years) from a prior prospective VERDICT study (April 2016–October 2017) were analysed retrospectively. Men who were suspected of prostate cancer and scanned twice using VERDICT were included. ADC maps were formed using 5b-value combinations and the within-subject standard deviations (wSD) were calculated per ADC map. Three anatomical locations were analysed per subject: normal TZ (transition zone), normal PZ (peripheral zone), and index lesions. Repeated measures ANOVAs showed which b-value range had the lowest wSD, Spearman correlation and generalized linear model regression analysis determined whether wSD was related to ADC magnitude and ROI size. Results: The mean lesion ADC for b0 b1500 had the lowest wSD in most zones (0.18–0.58x10−4 mm2/s). The wSD was unaffected by ADC magnitude (Lesion: p = 0.064, TZ: p = 0.368, PZ: p = 0.072) and lesion Likert score (p = 0.95). wSD showed a decrease with ROI size pooled over zones (p = 0.019, adjusted regression coefficient = −1.6x10−3, larger ROIs for TZ versus PZ versus lesions). ADC maps formed with a maximum b-value of 500 s/mm2 had the largest wSDs (1.90–10.24x10−4 mm2/s). Conclusion: ADC maps generated from b0 b1500 have better repeatability in normal TZ, normal PZ, and index lesions.


Introduction
Multiparametric MRI (mpMRI) is an established imaging technique for investigating suspected prostate cancer [1]. Diffusion Weighted Imaging (DWI) and the derived Apparent Diffusion Coefficient (ADC) maps are a central part of the mpMRI protocol. PI-RADS V2.1 (Prostate Imaging-Reporting and Data System) acknowledge that ADC is often calculated from ≥ 2b-values and that ADC is affected by b-value choice [2]. If only 2b-values are acquired, the recommendation is to use 100 s/mm 2 (preferably 50-100 s/mm 2 ) for the lowest and 800-1000 s/mm 2 for b max . This range is achievable by most clinical scanners and should avoid the diffusion kurtosis effect, seen at higher b-values. Due to a lack of ADC repeatability and reproducibility data, current PI-RADS uses ADC only as a qualitative component of prostate cancer risk stratification. Recently implemented guidelines by European Association of Urology [3] (EAU, 2019) advocate the use of quantitative ADC thresholds for detection of low-grade cancer to allow ~ 30% reduction in unnecessary biopsies [4].
For quantitative prostate ADC use, QIBA (Quantitative Imaging Biomarkers Alliance) suggests that 2b-values are included: lower b-values: 50-100 s/mm 2 and b max = 500-1500 s/mm 2 [5]. QIBA stresses the importance of assessing the precision (repeatability) of ADC to define confidence intervals (CIs) for quantitative ADC thresholds, which can be determined through calculations of the wSD (within-subject standard deviation) of testretest scans [6]. For longitudinal studies, e.g. assessing therapy response, wSD can be used to determine the significant changes in ADC [5,6]. To establish CI for cross-sectional, diagnostic thresholds [5,6] estimate of bias (accuracy) would be required in addition to repeatability. The range and combination of b-values have an impact on prostate ADC, with significant differences observed from ADC from 2b to values compared to multiple b-values [7,8]. Furthermore, b max may impact clinical utility; one study showed greater accuracy at predicting prostate cancer with ADC from b max = b 1000 compared to b max = b 2000 , whereas the opposite has also been demonstrated [9]. Since ADC values are varying among the prostate zones and lesions (e.g., lower for TZ and cancer), different b-value combinations are expected to have varying impact on ADC contrast and repeatability in different prostate locations.
To date, a handful of rigorous prostate ADC repeatability studies, with limited subject numbers (<12) [10][11][12], have shown low prostate ADC repeatability (~50%) compared to ADC of other organs [5]. QIBA recommends > 35 subjects for test-retest studies to achieve nominal confidence intervals for repeatability estimate [13]. The larger multicentre ADC repeatability study of 29 subjects in the imaging arm of ACRIN 6701 trial [14], has recently reported encouraging observations of better than 10% ADC (b = 0,800 s/mm 2 ) repeatability for the whole prostate scanned on the same day or up to two weeks apart, but included evaluation for only 10 lesions with much lower reported ADC repeatability (up to 40%). Due to small subject numbers and challenges of DWI prostate acquisition and analysis, previous studies were not statistically powered to assess the dependence of ADC repeatability on acquisition protocol (b-values and diffusion times) and analysis (lesion size and location). Such dependencies are of interest for radiologists to improve prostate ADC precision and enable the inclusion of quantitative ADC information in future PI-RADS guidelines.
VERDICT (Vascular, Extracellular, Restricted Diffusion for Cytometry in Tumours) is an advanced diffusion imaging technique that utilises six b-values (0 90 500 1500 2000 3000 s/mm 2 ) with varying diffusion times, and a mathematical model to derive quantitative intraand extra-cellular tissue fractions [15]. VERDICT derived parameters have demonstrated clinical promise to distinguish between high and low grade prostate cancer [16], and to avoid unnecessary biopsies [17]. The acquired diffusion data also allows generation of ADC maps using different permutations of b-values and diffusion times. Given the potential clinical utility, and that the sequence consists of multiple b-values, it would be beneficial to establish if a repeatable ADC could also be derived from the VERDICT scan, as opposed to acquiring an additional DWI sequence separately.
Detection of clinically significant changes in ADC requires determination of the baseline variability in ADC measurements [6]. This study sought to establish which b-value combination provides the most repeatable prostate ADC, using the VERDICT acquisition for test-retest assessment of > 35 subjects.

Materials and Methods
The present study retrospectively analysed data from a repeatability arm (n = 41) of a prospective cohort study (n = 70) called INNOVATE (ClinicalTrials.gov identifier NCT02689271) which focused on the diagnostic utility of VERDICT parameters, rather than ADC repeatability due to varying diffusion times for different b-value combinations [16]. Ethical approval was obtained from the London-Surrey Borders Research Ethics Committee and written informed consent was attained from study participants. The full study protocol is available at [18]. Prostate Cancer UK funded the trial.

Study participants
Men were eligible if they had clinical suspicion of prostate cancer due to either a raised PSA or a suspicious digital rectal examination, or were undergoing active surveillance. Participants were excluded if they had undergone prostate cancer treatment, were receiving ongoing hormonal prostate cancer treatment, or had received a biopsy within 6 months before their MRI.
The repeatability cohort included 41 men who had two identical multi b-value VERDICT acquisitions with an interval ≤ 5 min (median age: 67.7 years; range: 50-82 years). Of these 41, 10 men were randomly selected to vacate the scanner after their first scan which necessitated repositioning for the second acquisition. The remaining 31 stayed in the scanner between acquisitions and were not repositioned. Patient demographics are listed in Table 1. Analysis was separated for the 2 repositioning groups. All men underwent single clinical mpMRI in addition to the repeated multi b-value VERDICT acquisition on a 3 T MRI scanner. The mpMRI was assessed by experienced uro-radiologists (>10 years' experience in mpMRI).
Following mpMRI, 19 men with a PI-RADS score ≥ 3 had targeted transperineal biopsy of identified lesions. Histologic examinations from the biopsy cores were evaluated in the standard clinical fashion and assigned an overall Gleason Grade [19](A.F. and M.R, 13-and 15-years prostate pathology experience, respectively).

MR Imaging and image analysis
All imaging was performed on a 3 T Philips Achieva scanner. Hyoscine butylbromide (Buscopan, Boehringer Ingelheim, Ingelheim am Rhein, Germany; 0.2 mg/kg, up to 20 mg) was intravenously administered prior to imaging to reduce peristalsis.
Details of the multi b-value VERDICT sequence are in Table 2 (Acquisition Time: 12 min 25 s). Patients also underwent mpMRI in the same session [20].
ADC maps were generated from subsets of the multi b-value VERDICT data. Trace images were generated for each b-value. These were registered to b 0 from b 3000 from the first acquisition for each subject. ADC maps were generated using an in-house model designed in MATLAB (version 2020a, MathWorks Inc., Natick, MA, USA) on a voxel-by-voxel basis using Eq. (1), where S(b) is the signal at a given b-value, and S 0 is the signal with no diffusion weighting.
Various b-value combinations were used to generate ADC maps to test ADC repeatability. In total, 5 versions of ADC maps were created for each subject for scan1 and scan2 (Table 3). These 5b-value combinations were selected from all available to ensure there was ≤ 15 ms difference between the scan TEs (to alleviate inconsistent T 2 weighting for different b-values intrinsic to the VERDICT protocol, Table 2) and excluded b 3000 which may not be clinically achievable. TE = 80 ms was used for all combinations which included b 0 , (the b 0 image from the b max = 3000 acquisition from the full VERDICT sequence).
Regions of Interest (ROIs) were created using Mango Software (Research Imaging Institute, UTHSCSA). ROIs were drawn in 3 locations per subject: around the index lesion identified on mpMRI, in normal TZ and normal PZ, by an experienced board-certified radiologist (S.S 4 years' experience in mpMRI). As all trace images were registered to the b 0 from b 3000 from the first acquisition per subject, only one set of ROIs was needed per subject. These ROIs were then applied onto the registered ADC maps for scan2. Normal PZ and TZ were selected based on PI-RADS scores of 1 or 2. Analyses were confined within the ROIs; therefore, if ROI delineation was not possible within a certain location it was excluded for that subject. Furthermore, image quality was visually assessed and if an ROI showed poor quality or artefacts it was also removed from analysis.

Statistical analysis
Statistical analysis was conducted in SPSS (version 24 [IBM, Armonk, NY]) and SAS, and hypothesis tests were two-sided. The assumption of normality for wSD of the ADC was checked by performing a Wilks Shapiro test, and also by examining the mean, median, and mode of the distribution, as well as skewness and kurtosis. To assess repeatability, the within-subject standard deviation (wSD) [5] was calculated for each subject and each ADC map in all 3 locations (normal TZ, normal PZ, and index lesions). This was achieved by calculating the variance for each subject, given by the squared difference between both measurements divided by 2 and taking the mean of the variances across all subjects (providing the within-subject variance), then taking the square root of this value.
The wSD was calculated for all five ADC maps per region. The distributions of wSD, wSD 2 , and the log distributions were compared and the robustness of the wSD results was assessed with a model based on the logs. To test if repeatability was affected by the magnitude of the ADC value, for each b-value combination, GLMs (Generalized Linear Models) were fit where the dependent variable was the wSD and the independent variable was the magnitude of the ADC (using ADC mean from scan1 and scan2). To determine which b-value range provided the lowest wSD, repeated measure ANOVAs with Bonferroni adjustments for multiple comparisons were conducted for the separate locations. GLMs were performed for each region (TZ, PZ and index lesions). First, the interaction between b-value combination and whether or not the patient was repositioned was tested. If statistically significant (p < 0.05), these subjects were analysed separately, if not significant, the data was pooled and adjusted for the binary variable of whether or not the patient had been repositioned. A statistically significant difference between wSD values was then considered if p < 0.05.
Bland-Altman plots were used to identify trends in the differences and construct limits of agreement (LOA) between the two repeated scans. Additionally, Spearman correlation coefficients were used to assess the association between ADC mean and Lesion Likert score. The associations based on the correlation coefficients were interpreted as: 0.00-0.20 = negligible, 0.21-0.40 = weak, 0.41-0.60 = moderate, 0.61-0.8 = strong, and 0.81-1.00 = very strong. A statistically significant relationship was considered if p < 0.05.
To assess the effect of ROI size on repeatability, the data from normal TZ, normal PZ and index lesions were pooled. A GLM was fit where the dependent variable was the log of wSD and the independent variables were location (normal TZ, normal PZ, or index lesion) and voxel count within ROIs. We used GEEs (Generalized Estimating Equations) to account for the clustered nature of the data, treating patients as a random effect and with an exchangeable working correlation matrix structure, and assessed adjusted regression coefficient for ROI voxel count. A Wald test was used to determine if the regression coefficient for number of voxels was significantly different from zero. A statistically significant difference was considered if p < 0.05.

Results
A total of 41 men were included in the short-term repeatability study (31 who did not undergo repositioning between scan1 and scan2, and 10 who were randomly assigned to be repositioned). A flow chart detailing the patient numbers is shown in Fig. 1. In total, ADC repeatability was studied for 37 prostate regions in normal TZ, 30 in normal PZ and 35 index lesions (19 biopsied).
Of the 10 men who were repositioned between scans, ROI delineation was possible in the normal TZ for 7 participants and in the normal PZ for 4 participants. Nine men had index lesions which all had PI-RADS ≥ 3 (3/9 (33.3%) = 3, 4/9 (44.4%) = 4, 2/9 (22.2%) = 5). Three of the 9 lesions underwent targeted biopsy all with clinically significant prostate cancer with Gleason Grade ≥ 3 + 4 (1/3 (33.3%) = 3 + 4, 1/3 (33.3%) = 4 + 3, 1/3 (33.3%) = 4 + 5). The PI-RADS scores for the biopsied lesions were as follows: 2/3 (66.7%) = PI-RADS 4, 1/3 (33.3%) = PI-RADS 5.  Table 4. In a model assessing the effect of ADC magnitude on wSD, no statistically significant relationship was found (p > 0.05), thus, estimates of wSD were constant over the range of ADC values, and so wSD was deemed a more appropriate repeatability metric than wCV.  Table 4, b0 b1500 provided the lowest wSD in all regions in the non-repositioned cohort and for index lesions within the repositioned cohort, while b 0 b 2000 provided the lowest wSD for normal PZ and TZ in the repositioned cohort. However, the difference between these wSDs never reached statistical significance (p > 0.05). Given that b 0 b 1500 provided the lowest wSD in the majority of regions, this b-value combination was used for further analysis.
Pooling the data from all regions, a GLM showed that the size of the ROI was related to the wSD (p = 0.02). It can be seen from Fig. 5 that as the number of voxels increases, the log wSD decreases with the adjusted regression coefficient of −1.6 x10 −3 with 95% CIs of [−3, −0.3] x10 −3 . This was close to the −1.4 x10 −3 regression value that would be expected for random noise dependence on number of samples ( 1 VoxelCount }. The ROI color-coding, based on zone shows that index lesions tended to have the lowest voxel counts (clustering below 100 voxels), followed by normal PZ and normal TZ, with the largest region sizes.

Discussion
ADC forms part of the mpMRI protocol and has proved valuable in prostate cancer detection and grading [21,22]. The VERDICT protocol has demonstrated utility in prostate cancer grading [16] and also allows generation of ADC surrogate metrics as the sequence consists of multiple b-values with variable diffusion times. Given the dependence on DWI scan parameters, in the current PI-RADS recommendations, which have been rather broad, derived ADC has only been used qualitatively. Recommendations do not include specific guidance on mixing diffusion and echo times outside of utilising TE ≤ 90 ms. Determining which b-value range provides repeatable ADC is important if the quantitative value is to be used for discrimination of clinically-significant cancer from benign changes and low-grade disease [3].
This study compared ADC repeatability (quantified by wSD) from different b-value combinations in prostate tissue using the multi b-value VERDICT sequence with variable diffusion times, for subjects who were scanned twice. This study provides the largest repeatability cohort (n = 41) published to date, sufficient [13] to allow analysis of repeatability dependence on acquisition parameters, region size and location. Image quality was deemed adequate for all ROIs included in the final analysis. Although VERDICT b-values do not include 800-1000 s/mm 2 as recommended by PI-RADS, the ADC repeatability trends derived for lower and higher b-values may inform both future repeatability investigations and, ultimately, technical acquisition and analysis recommendations.
Among the five studied b-value combinations, the repeatability was best for ADC from b 0 b 1500 , and was unaffected by ADC magnitude, location, and lesion Likert score. There was a relationship with ROI size; an increase in number of voxels showed a reduction in variability for normal TZ, compared to normal PZ and index lesions (with lowest sizes and highest wSD consistent with increased random noise effect on ADC). Compared to normal PZ and TZ, the observed repeatability was lower (variability higher) for index lesions and marginally better than reported in literature for protocols that used lower b-values and which would allow establishing technical performance thresholds by pooling data across lesions.
It was demonstrated that ADC with b max = 500 s/mm 2 had the highest variability, indicating low repeatability. Lower repeatability observed for ADC generated from b max = 500 s/mm 2 may be attributed to perfusion effects. At lower b-values the diffusion signal is influenced by fast pseudo-diffusion and pulsatile perfusion effects [23]. The PI-RADS recommendations acknowledge that perfusion information may be obtained using b-values: 0-500 [24]. The pseudo-diffusion coefficient attributed to this relationship has previously been shown to have poor repeatability and high variability in various organs, including prostate [25][26][27][28]. Furthermore, b 500 may not provide sufficient contrast-to-noise for slow diffusion in prostate tissue at moderately long diffusion times used by the VERDICT protocol.
Larger ROI sizes increased repeatability, possibly due to reduced noise effects on the mean ADC and reduced registration errors between the two scans, as a greater areas and voxel ADC samples were analysed [29]. Therefore, care should be taken when analysing smaller lesions, especially in the PZ where most cancerous lesions are sited.
To reduce kurtosis effects, seen at higher b-values, PI-RADS recommends using b max = 1000 for ADC calculations [2]. VERDICT does not include b 1000 , however, the present study is still in line with QIBA guidelines, which recommend b max = 500-1500 [5]. To reduce variability of multi b-value acquisitions and comply with mono-exponential ADC model assumptions, it is essential to use consistent TEs, which was not possible for retrospective analysis of VERDICT acquisitions in this study. To alleviate inconsistent TE, the studied b-value combinations were limited to those within 15 ms difference. Adhering to PI-RADS recommendations (TE ≤ 90 ms), shorter diffusion times for high b-values would also help reduce SNR bias in derived mean ADC values.
An intrinsic limitation of this study is that VERDICT uses different TRs and TEs for different b-values which may not be optimised for prostate ADC measurement. Mean ADC values were highly variable across the different b-value combinations. Only b 2000 has TR ≥ 3,000 ms, which is the current PI-RADS and QIBA recommendation [2,5]. A longer TR is needed to alleviate T 1 effects for ADC; however, VERDICT is optimised for prostate cancer detection and grading based on the model that quantifies extra-and intra-cellular diffusion fractions, rather than ADC [15]. As VERDICT is clinically promising [16] it was beneficial to investigated whether this sequence could also generate repeatable ADC maps without modifying acquisition parameters, and retrospectively selecting b-value combinations with close TEs.
The lower variability observed for the higher b-value combinations may be related to increased b-value averaging, which is needed due to the decrease in signal with increase in b-value (b 90 = 4 averages, b 0 , b 500 , b 1500 and b 2000 = 6 averages). This may not be practical for clinical protocols due to the increased scan time. Additionally, high repeatability does not guarantee clinical diagnostic value. However, utilising > b 1000 for ADC has shown promise in discrimination between high and low grade prostate tumours [9]. Our results indicated marginal improved correlation of ADC with Likert score for b 0 b 500 b 2000 than b 0 b 1500 , which had better repeatability. The clinical choices of ADC acquisition will need to balance improved stability and improved discriminatory power.
Another limitation, due to the retrospective nature of the analysis, is that the test-retest scans were performed on a single system during a single exam with majority of patients not undergoing repositioning. This only provided short-term ADC repeatability values. These errors could be optimistic for long-term or multi-site prostate ADC studies. However, other research (using lower maximum b max = 800 s/mm 2 ) has shown that test-retest repeatability with repositioning and multi-day reproducibility are largely equivalent [30]. Furthermore, acquired data was sufficient for the purpose of the present study to evaluate relative repeatability trends across b-value protocols, prostate zones, region sizes, and ADC values.
Additionally, the subset of patients who were repositioned showed similar repeatability results and thus added support for assessed repeatability values. Repeatability of index lesions was similar to that observed for significant cancers confirmed by biopsy.
In summary, ADC maps formed from b 0 b 1500 combination from the VERDICT acquisition had the least variability. This was found in normal TZ, normal PZ, and prostate cancer lesions. Repeatability increased with ROI size and for normal TZ, versus normal PZ, versus index lesion consistent with reduced noise for mean ADC. The index lesion mean ADC was moderately negatively correlated with Likert score. Importantly, lesion wSD did not increase with ADC magnitude, hence wSD was a more appropriate repeatability metric than wCV (normalised by mean ADC) for prostate ADC. Future studies should address the limitation of consistent acquisition parameters to optimize ADC acquisitions and evaluate effect of scan resolution for small lesions. Having low variability makes it easier to ascertain whether there have been clinically significant ADC changes in longitudinal therapy response studies and active surveillance prostate imaging programs.

Supplementary Material
Refer to Web version on PubMed Central for supplementary material.

Funding Information:
This work was directly supported by Prostate Cancer UK: Targeted Table 3 The 4b-value combinations used to form the ADC maps.  Table 4 Mean wSD and Mean ADC and Standard Deviations for each b-value combination in the different locations for the non-repositioned cohort (n = 31) and