UK quantitative WB-DWI technical workgroup: consensus meeting recommendations on optimisation, quality control, processing and analysis of quantitative whole-body diffusion-weighted imaging for cancer

Objective: Application of whole body diffusion-weighted MRI (WB-DWI) for oncology are rapidly increasing within both research and routine clinical domains. However, WB-DWI as a quantitative imaging biomarker (QIB) has significantly slower adoption. To date, challenges relating to accuracy and reproducibility, essential criteria for a good QIB, have limited widespread clinical translation. In recognition, a UK workgroup was established in 2016 to provide technical consensus guidelines (to maximise accuracy and reproducibility of WB-MRI QIBs) and accelerate the clinical translation of quantitative WB-DWI applications for oncology. Methods: A panel of experts convened from cancer centres around the UK with subspecialty expertise in quantitative imaging and/or the use of WB-MRI with DWI. A formal consensus method was used to obtain consensus agreement regarding best practice. Questions were asked about the appropriateness or otherwise on scanner hardware and software, sequence optimisation, acquisition protocols, reporting, and ongoing quality control programs to monitor precision and accuracy and agreement on quality control. Results: The consensus panel was able to reach consensus on 73% (255/351) items and based on consensus areas made recommendations to maximise accuracy and reproducibly of quantitative WB-DWI studies performed at 1.5T. The panel were unable to reach consensus on the majority of items related to quantitative WB-DWI performed at 3T. Conclusion: This UK Quantitative WB-DWI Technical Workgroup consensus provides guidance on maximising accuracy and reproducibly of quantitative WB-DWI for oncology. The consensus guidance can be used by researchers and clinicians to harmonise WB-DWI protocols which will accelerate clinical translation of WB-DWI-derived QIBs.

objective: Application of whole body diffusion-weighted MRI (WB-DWI) for oncology are rapidly increasing within both research and routine clinical domains. However, WB-DWI as a quantitative imaging biomarker (QIB) has significantly slower adoption. To date, challenges relating to accuracy and reproducibility, essential criteria for a good QIB, have limited widespread clinical translation. In recognition, a UK workgroup was established in 2016 to provide technical consensus guidelines (to maximise accuracy and reproducibility of WB-MRI QIBs) and accelerate the clinical translation of quantitative WB-DWI applications for oncology. methods: A panel of experts convened from cancer centres around the UK with subspecialty expertise in quantitative imaging and/or the use of WB-MRI with DWI. A formal consensus method was used to obtain consensus agreement regarding best practice. Questions were asked about the appropriateness or otherwise on scanner hardware and software, sequence optimisation, acquisition protocols, reporting, and ongoing quality control programs to monitor precision and accuracy and agreement on quality control. results: The consensus panel was able to reach consensus on 73% (255/351) items and based on consensus areas made recommendations to maximise accuracy and reproducibly of quantitative WB-DWI studies performed at 1.5T. The panel were unable to reach consensus on the majority of items related to quantitative WB-DWI performed at 3T. conclusion: This UK Quantitative WB-DWI Technical Workgroup consensus provides guidance on maximising accuracy and reproducibly of quantitative WB-DWI for oncology. The consensus guidance can be used by researchers and clinicians to harmonise WB-DWI protocols which will accelerate clinical translation of WB-DWI-derived QIBs. Table 1. +, agree but no consensus; *, agree with consensus; **, strongly agree with consensus

Harmonisation
One acquisition protocol should be created to cover all quantitative (response) assessment of metastatic disease ("one size fits all") introduction Whole body MRI, including diffusion-weighted imaging (DWI), offers significant advantages over other cancer imaging modalities; combining a high soft tissue contrast and adaptable spatial resolution with "functional" imaging without exposure to ionising radiation. Fuelled by recent technological advances, the use of whole body MRI in oncology is rapidly increasing, both for clinical research and for routine clinical imaging of specific indications (e.g. multiple myeloma). 1 Whole body DWI (WB-DWI) enables assessment of RECIST non-measurable disease foci such as bone metastases 2 and, as well as depicting anatomy, DWI provides microstructural information 3 that can be correlated with tissue metabolism. 4 Quantitative evaluation of DWI provides a measure of the average apparent diffusion coefficient (ADC) of water within a voxel. Differences in ADC between voxels reflect differences in the cellular composition (microstructure) of individual voxels. Despite many recent publications demonstrating the potential of ADC to act as a quantitative imaging biomarker (QIB) for oncology, 5-11 WB-DWI-derived ADC assessment has not become widely used in clinical practice, and has struggled with adoption as the primary endpoint biomarker in multicentre trials. Key factors limit generalizability of WB-DWI, these include: (i) the complexity of optimising WB-DWI protocols to produce artefact free images; (ii) the lack of standardization of WB-DWI acquisition parameters; and (iii) heterogeneity in derivation and interpretation of ADC values. 12 To address these factors and promote research and clinical applications of WB-DWI, there is a need to agree hardware and software requirements, and provide sequence optimization, acquisition, reporting and quality control (QC) guidance for maximising accuracy and reproducibility. [11][12][13][14][15][16] To address this challenge, a group of experts (UK Quantitative WB-DWI Technical Workgroup) was convened from cancer centres around the UK with subspecialty expertise in quantitative imaging and/or the use of WB-DWI within clinical and/or research practice.
This document is the output from this group and is intended to act as guidance for clinicians, radiographers and MR physicists/clinical scientists who are considering development/ implementation of quantitative WB-DWI (qWB-DWI) for clinical trial or routine clinical use. The main recommendations from this review are listed in Tables 1-3.

methods and materials
The consensus method A consensus approach was developed based on the RAND/ UCLA Appropriateness Method. This approach aimed to obtain consensus agreement regarding best practice for the implementation of qWB-DWI (http://www. rand. org/ pubs/ monograph_ reports/ MR1269. html). This method includes a combination of remote and face-to-face consensus rounds and combines the best available scientific evidence with the collective judgment of experts to yield statements regarding the appropriateness of relevant aspects of the topic under investigation. It is particularly suited to areas with a relative paucity of high quality level 1 evidence (e.g. randomized controlled trials) or, if the evidence is available, it does not contain sufficient detail to guide practice applicable to the range of patients seen in everyday clinical practice.
When using this method, appropriateness levels are used to communicate the perceived balance between risks/costs and benefits of each item under discussion. Our approach followed the RAND/UCLA Appropriateness Method as it is set out in the user's manual throughout the process, as much as possible. 17 Panel selection Leading clinicians, radiographers and scientists from the UK, with known subspecialty expertise in quantitative imaging and/ or the use of imaging to inform treatment, were approached (MR physics, MR radiology, nuclear medicine, radiotherapy physics, oncology). An independent chair was selected, with experience using formal consensus methods to develop clinical guidelines. A total of 25 panel members were confirmed.
Construct of the questionnaire An extensive questionnaire containing 369 items was constructed between August and November 2015. The first draft was produced by four panel members with a background in MR physics.
The questionnaire was split into six main areas for consideration-initially defining the scope and requirement for (i) harmonisation; followed by addressing specific items needing consensus to achieve harmonisation goals, specifically: (ii) hardware specification, (iii) optimization, (iv) routine quality assurance (QA), (v) acquisition parameters and (vi) analysis and visualisation requirements. Within each section questions differentiated requirements for multicentre trials versus routine clinical practice, and between 1.5 and 3T magnetic field strengths.
(i) Harmonisation: This section was structured to identify critical requirements for a qWB-DWI protocol, and thereby provide a common standard; extending from desired applications to analysis and personnel needed to implement the technique. (ii) Hardware Specifications: This section asked questions to ascertain the minimum specifications that an MR scanner must have in order to acquire qWB-DWI and how to deal with variability in MRI hardware across sites. (iii) Site optimisation: These questions were split into two groups; one for optimising a clinical service and the second for a trial site. (iv) Routine QA: These questions were split into two groups; one for routine QA and control for a clinical service and the second for a trial site. analysis of the images including the processing of the data, such as intensity thresholding or motion correction prior to the ADC calculation, as well as how the data is visualised in the most useful way for a radiologist to report.
First-round questionnaire completion before the meeting All panel members were sent the questionnaire as an online survey (29 February 2016 to completed by 24 March 2016), with relevant literature on the RAND/UCLA appropriateness method and technical articles on quantitative WB-DWI via the project website (https:// sites. google. com/ site/ wbadcconsensus/ home). They were instructed to score each item on a Likert scale 18 between 1 (strongly agree) and 5 (strongly disagree). A midpoint score of 3 indicated not necessary (or it does not matter) and a further category 6 (do not know) was used for the member to indicate that they did not have sufficient expertise to answer the question.
The questionnaire responses were collected and summarised by the consensus co-ordinator. A complete list of questions and modal answers are listed in Appendix A. Cell is colour coded grey if no consensus was reached, the text indicates the value of the mode answer of the panel.
Face to face meeting format The meeting was convened for one day in London, 12 April 2016. 23 panelists attended (2 were unable to attend the face-toface discussions). The co-ordinator convened the meeting and documented key points of discussion. The whole meeting was audio recorded in order to check points in the discussion when preparing the manuscript.
At the beginning of the consensus meeting, selected expert panel members presented on the following topics: DWI in oncology, QA in DWI and practical aspects of performing whole-body DWI. Speakers were asked to summarise the evidence in the given area and to highlight areas of controversy.
Thereafter, for each individual question included in the questionnaire, a summary of the panel scores was presented and the topic discussed by the panel. After the discussion, the panelists rescored that item and were free to maintain or change their original response from the prior on line completion stage.
Four questions were reworded during the panel discussion to improve clarity. Six questions were added and scored during the meeting. 12 questions were removed during the meeting. In particular, it was decided to remove questions 6 and 7 regarding the use of whole body imaging as a tool for the assessment of other diffuse disease such as inflammation, i.e. there was consensus on using this technique only for wide-spread oncological disease. Questions 57 and 63 were also removed, since it was felt the use of open bore magnets to measure quantitative WB-DWI was a moot point, i.e. there was consensus not to use this type of scanner. Questions 360-367 were removed owing to the panel's general agreement that there was currently a lack of data regarding the usefulness of using percentage change in ADC values as a response criteria or at what point it should be measured post-treatment to be able to answer these questions, i.e. there was consensus that this should not be used as a criteria to report. Questions 94-97 regarding the tests to be performed during acquisition optimisation were added during the discussion. One more question was added to the routine QA/QC section to allow the option of performing QC test quarterly, and four items that included the title "clinical scientist" were changed to "appropriately trained personnel". In all cases, these changes were made with full agreement of the members of the panel and scored during the meeting.

Interpretation of the results
The results of the second round of scoring were interpreted according to the RAM user's manual, 17 i.e. only those items scored (on the scale 1-5 and not 6) by at least eight panel members were included in the results (every single item met this criteria). Consensus was defined as described in the user manual and listed below in Table 4.
The modal answer is calculated for each question and then the number of answers outside of the 2-point range that includes the

results
Supplementary material (Supplementary material available online.) includes the complete list of questions and modal scores before and after the panel discussion. Cells colour coded white indicate that consensus was reached and grey indicates that it was not. The text indicates the value of the mode answer within the group.
Consensus was achieved on 197/369 prior to and 255/351 following the consensus meeting. Tables 1-3 lists the items whose modal answer was agree (*) or strongly agree (**) with consensus or agree with no consensus (+). Areas for which consensus were not reached are listed in Table 5 where the modal answer not necessary =(o) or disagree = (x).
Harmonisation Table 1a highlights the consensus amongst the expert panel on the general need for harmonisation of MR protocols, optimisation, QC and post-processing of qWB-DWI studies.
Although aspirational, the ability to compare ADC values (without reference to the hardware or software by which it was acquired) was unanimously agreed as a goal of harmonisation. The panel defined the scope of the consensus as pertaining to the use of qWB-DWI for oncological imaging rather than the assessment of other diffuse diseases, such as arthritis. Answers to questions in each subsequent section pertain to the aspiration and scope as defined by the panel. On the particular question "a clinical scientist should perform the site set-up optimisation" the panel came to a mutual agreement that "clinical scientist" (a protected professional title) should, therefore, be replaced by "an appropriately trained person". During further discussion, complete agreement was also reached on the fact that consistent and comprehensive training in specialist quantitative MRI techniques will result in high quality and consistent image quality. A national training program could provide recommendations based on this consensus paper and practical hands-on experience of: site set-up and optimisation; routine QA; acquisition/scanning protocols; patient set-up; quantitative metrics; and statistics.
Hardware specification and acquisition protocol/ optimisation protocol/routine QA Table 2 provides statements relating to consensus on specific acquisition parameters, optimization procedures and QA when setting up quantitative WB-DWI. Items that are bolded in Table 2 reached consensus at both 3T and 1.5T field strength for routine clinical and multicentre trial applications; non-bold items achieved consensus for 1.5T only for routine clinical scanning and multicentre trials. Figure 1 shows a typical whole body MRI scan patient set-up for the threee main scanner manufacturers.

Hardware specification and acquisition protocol
The majority of the panel (those that were already familiar with acquiring qWB-DWI) felt strongly that robust qWB-DWI at 1.5T was achievable but it was acknowledged that (Table 5) there are challenges for translating these protocols to 3T platforms and that more evidence needed to be gathered on quantitative applications at 3T. In panel, consensus reflected previous recommendations, 12,19-22 that WB-DWI is performed axially at multiple anatomical stations from head to midthigh (~4-5 sections) each acquired using the same MRI parameters and taking approximately 30 min in total. In order to acquire maximal SNR of DWI within a 30 min time-frame, this will typically limit scans to the acquisition of two b-values only: 50-100 and 800-1000 s mm -2 . However, there are current scanner acquisition platforms such that b = 0 s mm -2 is collected by default in order to calculate ADC maps at the scanner console. There are, then, two options recommended by the panel; collect three b-values (at the expense of increased time for acquisition) where the second b-value should be between 50 and 200 s mm -2 ; collect two b-values (not b = 0 s mm -2 ) and perform the ADC calculation offline. Fat suppression should be used to remove unwanted signals from fat and multiple averages of each image to be acquired. Anecdotal evidence from panel members frequently using WB-DWI allowed an additional recommendation that the patient breathes freely during the diffusion-weighted acquisition. Basic acquisition parameters provided in Table 2 were agreed as a starting point for optimization at sites that have not previously performed WB-DWI. It was also agreed, that if not possible to implement even these basic parameters, then sites should not attempt quantitative WB-DWI, particularly for the purposes of multicentre trials.
Although the panel supported the use of 3T for WB-DWI, it noted that there is currently insufficient evidence to recommend standardised basic acquisition protocols. Instead the panel recommends that a suitably trained person should carefully optimise all acquisition parameters listed in Table 2 for any particular 3T system.

Protocol optimisation:
Consensus recommendations for DWI as an oncology biomarker recommend that protocols should be "optimised to maximise SNR, minimise artefacts from ghosting and distortion and optimise fat suppression". 11 This panel also reached the same consensus. For multicentre trials, the panel recommended, as essential, protocol development using the parameters listed in Table 2 as a starting point, with the appropriate phantoms listed to interrogate the effects of eddy current-induced distortion and fat suppression. The reader is directed to an excellent review of these optimisation steps and their practical implementation described by Winfield et al 23

Routine QA:
There was a clear distinction in number of QA items reaching consensus between routine clinical scanning and multicentre trials (3 vs 8 out of 10). While performing additional specific QA tests to support a multicentre trial was considered mandatory, there was no consensus on the need for additional specific QA in routine clinical practice. In general, the panel felt most MRI departments in the UK will have a regular routine QA strategies together with preventative maintenance contract with manufacturers/third party providers, which with optional additional coil checks on a daily/weekly basis (time and staff permitting) should be sufficient for routine clinical applications. The panel's recommendation was to carry out routine QC tests listed in Table 2 (measurements for eddy current distortion, ghosting and ADC linearity in the z-direction) for both routine clinical and multicentre trials every 3 months and/or after major software or hardware upgrades or repairs. The panel cited the work by the American College of Radiology and the Association of American Physicists in Medicine (AAPM) on the practice and interpretation of regular QC tests and would seem a good place to start. 20,21 Processing and analysis/visualisation and reporting Table 3 relates to statements where positive consensus was reached relating to processing, analysing, visualisation and reporting of quantitative WB-DWI. The items that are bolded and italicized in Table 3 are applicable to both staging and response assessment oncological applications, otherwise they are only applicable to response assessment.

Processing and analysis:
This section achieved the least number of items of consensus both before and after the panel meeting. Those items on which the panel did reach consensus can be summarised as "keep it simple". As with other quantitative imaging techniques, no consensus was reached with regard to the methodology for delineation of disease in WB-DWI, whether manually or by some automated/semi-automated segmentation process. Nor was there consensus on whether whole-body disease burden was preferred over a RECIST inspired approach of five target lesions as suggested by Perez-Lopez et al. 22 By extension, there was lack of consensus on which, if any, ADC statistics should be obtained from within the delineated disease (e.g. standard deviation, kurtosis or skewness of ADC values within the tumour volume) or the actual tumour volume itself. The panel felt that whilst some published literature has demonstrated preliminary evidence that such statistics offer quantitative approaches for assessment of patient prognosis 24 and heterogeneous treatment response, 25,26 these methods are still in their infancy and required further validation before consensus could be achieved. All major manufactures provide workstation applications for post hoc analysis Philips Healthcare (Vistar), Siemens Healthineers (Syngo.Via), GE Healthcare (AW) and there are several third party applications OSIRIX (pixmeo, Geneva, Switzerland), Mirada Medical (Oxford, England), as well as open-source solutions such as ImageJ (National Institue for Health, USA). All provide ADC map calculations as well as ROI/VOI toolboxes and image intensity thresholding applications, as well summary statistic extraction tools. While nuclear medicine applications have been used for decades to assess quantitative 3D sectional imaging, main radiology applications such as X-ray CT and MRI have traditionally been reviewed on picture archiving and communication system (PACS) viewing stations with little or no added functionality. In truth, until this functionality has been added to PACS there will be a slow uptake in busy clinical National Health Service (NHS) imaging departments of specialty reporting tools that require additional workstations.

Visualisation and reporting:
The paper by Padhani et al 11  Given the increasing number of sites with both a research and clinical interest in WB-MRI within the UK; this consensus was aimed specifically at deriving a UK-specific strategy on harmonising quantitative WB-DWI. However, many of the panel members have previously taken part in international consensus panels and have an international reputation providing a non-UK centric perspective on the subject and the outcomes of a UK approach to harmonisation will also have relevance to other country/international organisations developing such guidelines.
Broadly, areas of consensus were: the need for specialist training, standardised guidelines for initial optimisation of sequence The need for image registration for longitudinal reporting + parameters, standardised routine QC tests and test objects, standardised acquisition parameters to ensure best SNR and patient comfort; the value of widespread use of quantitative WB-DWI for follow-up imaging in disseminated malignant disease; the use of the mono-exponential model for calculation of ADC; the use of summary statistics of ADC values for reporting, although ADC thresholds for "treatment response" and "disease progression" still need to be established. Consensus recommendations based on these areas are listed in Tables 1-3. Overall, a good level of agreement was achieved for quantitative WB-DWI at 1.5T, but it was acknowledged that there are challenges of translating these protocols to 3T platforms and that more evidence from experienced users was needed. In particular, the need to understand the role of the latest technologies in optimising the DWI signal: parallel imaging, fat suppression techniques, eddy current distortion corrections and static/ dynamic field inhomogeneities. Areas for which consensus were not reached ( Table 5) included implementation of the latest technology and novel analysis methods associated, the exact nature of the routine QC tests for routine clinical practice and how they should be performed and how often; and the metrics used to measure, characterise (histogram and associated statistics) and visualise ADC maps.
There exists the criticism that "many imaging biomarkers, remain confined to the academic literature without real application owing to a lack of efficient and effective strategies for biomarker translation". 12 The European Organisation for Research and Treatment of Cancer (EORTC) position paper concluded that MRI offers a good ''one size fits all'' solution for patients who do not have substantial non-bone disease to assess therapy effectiveness 30 and the recent interest in using MRI for radiotherapy treatment planning 31 and the advent of MRI Linacs 32 only highlights the need for establishing the robustness of this technique. This document has been prepared in answer to papers citing the need for comparative multimodal studies, to provide prospective quantitative data from treatment-response assessment settings. 27 This paper describes the use of the RAND style method 17 to achieve consensus on a range of aspects of a clinical procedure, quantitative WB-DWI, in order to obtain best practice that can be shared amongst the diagnostic radiology community in the UK. This publication should be received as a starting point for sites developing quantitative WB-DWI protocols that can then contribute to multicentre studies and enable clinical studies for specific emerging indications (e.g. multiple myeloma).

Future work
The panel recognized the need to develop and promote training opportunities for radiologists, MR physicists and radiographers specifically for the implementation, QA, reporting and analysis of whole body DWI for quantitative measures.
The group expects to produce a library of useful QA procedures and tests and establish tolerances for each of these tests across scanner platforms, such that the user of a specific platform (a) can determine whether their platform is performing within acceptable tolerances in order to be able to apply quantitative whole body DWI for clinical decision-making and (b) can take remedial measures, where possible, to correct any out of tolerance results. It is the intention of the first author (supported by an NIHR fellowship) to co-ordinate and gather further data from already participating clinical trial centres 33 to establish these tolerances.
There is also much needed, ongoing, NIHR-funded work developing post-processing techniques (e.g. those originally devised on T 1 and T 2 weighted data of the brain 34,35 and MR manufacturer independent analysis tools to visualize ADC maps), 36 the output of which is expected to standardize methods and help delivery of the QC tests recommended in Table 2.
In summary, using the RAND/UCLA consensus method, a UK-based panel was able to make recommendations to provide a robust and reproducible quantitative WB-DWI protocol suitable for 1.5T to be used routinely to evaluate conditions of disseminated cancer before and after treatment. It is the panel's intention to meet again in 3 years time to update the document.