Automatic Prediction and Assessment of Treatment Response in Patients with Hodgkin’s Lymphoma Using a Whole-Body DW-MRI Based Approach

The lack of validation and standardization represents the main drawback for a clear role of whole-body diffusion weighted imaging (WB-DWI) for prediction and assessment of treatment response in Hodgkin’s lymphoma (HL). We explored the reliability of an automatic approach based on the WB-DWI technique for prediction and assessment of response to treatment in patients with HL. The study included 20 HL patients, who had whole-body positron emission tomography (PET)/ magnetic resonance Imaging (MRI) performed before, during and after chemotherapy. Using the syngo.via MR Total Tumor Load tool, we automatically extracted values of diffusion volume (DV) and its associated histogram features by WB-DWI images, and evaluated their utility in predicting and assessing interim and end-of-treatment (EOT) response. The Mann–Whitney test followed by receiver operator characteristic (ROC) analysis was performed between features and their inter-time point percentage differences for patients having a complete or partial treatment response, revealing that several WB-DWI associated features allowed for prediction of interim response and both prediction and assessment of EOT response. Our proposed method offers huge advantages in terms of saving time and work, enabling clinicians to draw conclusions relating to HL treatment response in a fully automatic way, and encloses, also, all DWI advantages compared to PET/ computed tomography (CT).


Introduction
Hodgkin's lymphoma (HL) is a relatively uncommon B-cell derived tumor, in which the unique cellular microenvironment is crucial for accurate diagnosis and pathobiology [1][2][3]. The role of diagnostic imaging provides important information for an accurate pretreatment evaluation and assessment of response to treatment, which are crucial steps for a good management of HL patients. In particular, the hybrid technique positron emission tomography (PET)/computed tomography (CT) with 18-fluorodeoxyglucose ( 18 F-FDG) injection is considered the gold standard for HL management, from initial diagnosis to staging and assessment of response to treatment using the imaging-based Lugano classification [4][5][6][7].
Nevertheless, especially due to the harmful ionizing radiation dose involving both PET and CT modalities, there is an increasing interest towards integrated PET/MRI that combines the detailed morphological information provided by the radiation-free MRI with the functional information that characterizes PET images. The excellent soft tissue contrast displayed by MRI, together with the high PET sensitivity, allows to perform robust diagnostic evaluations. The role of MRI is strengthened by the possibility to include functional MRI techniques, such as diffusion weighted imaging (DWI), in the protocol. DWI is a noninvasive tool that allows to quantify the random motion of water molecules (diffusion), which becomes hampered in structures characterized by high cellularity, such as lymphoma lesions. PET/CT, whole-body DWI (WB-DWI), as well, provides both anatomical and functional information. The characteristic DWI parameter is the apparent diffusion coefficient (ADC), which allows for a quantitative evaluation of changes in tissue cellularity, providing a useful tool for diagnosis and assessment of response to treatment in tumors, in particular for lymphomas. Several studies investigate the power of DWI for HL and Non-Hodgkin's (NHL) lymphoma diagnosis and assessment of treatment response compared to PET/CT, showing the potential role of DWI for these purposes [8].
Despite the continuous research and the promising results related to the usefulness of PET/MRI, especially when executed with DWI, the lack of validation and standardization represents the main drawback for a clear definition of the role of DWI in lymphoma diagnosis, staging, and response assessment. Moreover, it should be considered that, for a valid evaluation of response to treatment according to current guidelines, the readers need to have a deep experience and knowledge in the field of lymphomatous disease and their radiological evaluation [9]. Thus, we explore the possibility of having an automatic tool capable of providing support in lymphoma diagnosis.
MR Total Tumor Load stems from the promising results obtained by WB-DWI and ADC in multifocal disease, such as bone metastases and multiple myeloma, and the resulting need to dispose of an efficient tool able to evaluate this kind of lesions. This tool harnesses a threshold-based segmentation algorithm on whole-body diffusion-weighted images in order to identify regions of disease, and provides both the overall diffusion tumor volume and the histogram metrics of the corresponding computed ADC maps [10]. A high b-value image (acquired or computed) is used as input in order to maximize the contrast between lesions and healthy tissue. Several studies reported the advantages of using this tool especially in metastatic bone disease, but also in metastatic breast cancer and metastatic prostate cancer [10][11][12][13]. Another recent study (including three case studies) showed the benefits of the MR Total Tumor Load, not only in metastatic bone disease, but also in solid tumors. In particular, in one of these case studies concerning Hodgkin lymphoma in a 14-year-old girl, the reduction in tumor volume and the increase of low ADC values in ADC histogram between the pretreatment and the follow-up examination (after 2 months) are indicators of a good therapy response [14].
The purpose of this study is to evaluate the reliability of an automatic approach based on MR Total Tumor Load tool for WB-DWI technique relating to prediction and assessment of response to treatment in patients with HL, trying to detect the segmentation threshold, which is more capable of predicting and assessing response.

Patient Cohort
Twenty patients with histologically proven HL (11 men and 9 women), with a mean age of (35.7 ± 11.7) years were selected for this retrospective study. All subjects gave their informed consent for inclusion before they participated in the study. The study was conducted in accordance with the Declaration of Helsinki, and the protocol was approved by the Ethics Committee of the Istituto Nazionale Tumouri "Fondazione G. Pascale in 15 April 2020 (protocol number 3/20).
Inclusion criteria were patients being over the age of 18; histologic confirmation of HL at nodal biopsy; patients who underwent PET/CT followed by PET/MRI with WB-DWI at baseline before any treatment (T0), after two chemotherapy cycles (T1), and at the end-of-treatment (EOT) (T2), acquired from February 2016 to July 2018. All patients received doxorubicin (also known as Adriamycin), bleomycin, vinblastine, dacarbazine (ABVD) chemotherapy, and were asked to complete medical history questionnaires and sign informed consents to undergo hybrid PET-CT and PET-MRI investigations. Characteristics of included patients are shown in Table 1.

Acquisition Protocol
Data for all patients were acquired on both a PET-CT device and on a 3T hybrid PET-MRI system (Biograph mMR, Siemens, Erlangen, Germany) equipped with three 32-channel body coils, to cover the thorax, abdomen, and pelvis areas, and 12-channel phased array brain coils. Patients were asked to observe a fast of at least six hours. Sixty minutes after the 18-fluorodeoxyglucose ( 18 F-FDG) injection by antecubital access, a PET-CT examination was performed from the brain vertex to the pelvis region. Then, patients underwent a whole-body PET-MRI protocol, which consisted of the following sequences: coronal T2 Turbo Inversion Recovery Magnitude (TIRM); an axial DWI sequence with b values of 50 and 800 s/mm 2 ; axial and coronal T2 Half Fourier Acquisition Single Shot Turbo Spin Echo (HASTE), and an axial T1 Gradient Echo (GRE) in-out phase. The scan parameters are shown in Table 2. The attenuation correction is obtained from the segmentation (DIXON) into four classes, with predefined constant linear attenuation correction coefficients (LACs) for each class. The class denominations and the corresponding LACs were as follows: outer air (0 cm −1 ), lung (0.022 cm −1 ), fat tissue (0.085 cm −1 ), soft tissue (0.1 cm −1 ) [15]. Considering the PET data, the process of image reconstruction derived from an iterative algorithm called OSEM composed by three iterations on a matrix 172 × 172. Moreover, dividing data into 21 subsets analyzed cyclically, it was possible to control the noise within low absorption regions.

PET Response Evaluation
A radiologist and a nuclear medicine physician, respectively, with 7 and 10 years of experience, assessed, by consensus response, to treatment examining PET images on the basis of the visual Deauville 5-point scale (5-PS), according to the Lugano classification criteria in two sessions. In the first session, the interim response to treatment was assessed evaluating PET images acquired at interim and comparing them to those acquired at baseline; in the second session, the EOT response was assessed evaluating PET images acquired at the EOT and comparing them to those acquired at baseline. Patients were classified as having a complete metabolic response (CMR) in case of 5-PS score of 1, 2, or 3 in lymph nodal and extra lymphatic sites with or without a residual mass and no evident FDG-uptake in marrow. Partial metabolic response (PMR) in case of 5-PS score of 4 or 5 with reduced uptake compared with baseline, residual lesions of any size, and, relating to bone marrow, residual uptake higher than in normal marrow, but reduced compared with baseline. Stable metabolic disease (SMD), in case of 5-PS score of 4 or 5 with no evident change in FDG uptake, or progressive metabolic disease (PMD), in case of 5-PS score of 4 or 5 in any lesion with an increase in intensity of FDG uptake from baseline, and/or new FDG-avid foci, consistent with lymphoma, as well as new or recurrent FDG-avid sites in bone marrow [4,7]. For cases with a Deauville assigned score of 4 or 5 at T2, PMR, SMD, or PMD was defined considering also the interim PET scan.

WB-DWI Image Analysis and Data Extraction
The analysis of WB-DWI images at each time point was performed using the syngo.via Frontier MR Total Tumor Load released research prototype v1.3.3 (Siemens Healthineers, Erlangen, Germany). For each patient, for each time point, the b800 images were used to automatically define threshold-based masks, using the threshold-based segmentation approach proposed by Blackedge et al. [16] and implemented in the syngo.via Frontier MR Total Tumor Load [10] (see Figure 1). Six segmentation threshold values were used, namely 5%, 10%, 20%, 40%, 60%, and 80%. No subsequent mask editing was made. Whole-body diffusion weighted imaging (WB-DWI) image analysis and data extraction process using syngo.via Frontier MR Total Tumor Load software. Using the WB-DWI images as input (1. Data Loading step), b800 images were automatically segmented using setting a signal intensity threshold (e.g., 40%) at each WB-DWI acquisition time point (e.g., T0). No subsequent mask editing was made (2. Automatic segmentation). The overall mask volume (diffusion volume, DV) and the corresponding apparent diffusion coefficient (ADC) histogram metrics associated with the masked volume were extracted (3. Data extraction step). Yellow arrows link the three processing steps.
The overall mask volume (Diffusion Volume, DV) and the corresponding ADC histogram metrics associated with the masked volume were extracted. Specifically, extracted ADC histogram statistics were mean (ADCmean), standard deviation (ADCsd), median (ADCmd), 5% percentile (ADC5p), 95% percentile (ADC95p), skewness (ADCsk), excess kurtosis (ADCkurt), entropy (ADCentr). The following notation was used in this study to indicate the single feature: where the superscript T indicates the time point at which WB-DWI images were acquired (T = 0, 1, 2), and the subscript th% indicates the segmentation threshold used (th = 5, 10, 20, 40, 60, 80). For each of the six thresholds, percentage changes in parameters during treatment from baseline and after treatment from baseline were calculated as follows: (1) where f T th% is the value of the feature at time point 1 or 2 and f 0 th% is the value of the feature at baseline. Percentage changes in parameters, after treatment from their values during treatment, were also calculated for each of the six thresholds as follows: where f 1 th% is the value of the feature at interim and f 2 th% is the value of the feature at the EOT.

Statistical Analysis
Values of each parameter were tested for normal distribution beforehand using a Kolmogorov-Smirnov test complemented by a graphical assessment for data normality using boxplots and Probability-Probability (P-P) plots, both overall and by subgroups identified, according to interim (T1) and EOT (T2) response. All normally distributed variables were summarized as mean (standard deviation), while those non-normally distributed as median and interquartile range (Q 1 ; Q 3 ).
In order to predict interim and EOT response, the t-test (in case of normally distributed parameters) or alternatively, the Mann-Whitney U test (in case of non-normally distributed parameters) on parameters at baseline, and also on parameters at interim for prediction of response at T2, was used to test the difference between CMR patients and each group of not CMR patients (PMR, SMD, PMD). For prediction of EOT response, percentage changes between features at T1 and T0 (∆f 01 th% ) was also evaluated. For parameters significant to t-test or Mann-Whitney test, receiver operator characteristic (ROC) curves were constructed and area under the curve (AUC) calculated to determine sensitivity and specificities and to find cut-off values that may be predictive of a poor response to treatment. The t-test or Mann-Whitney U test followed by ROC analysis for significant parameters was also performed on parameters at T1 and T2 as well as on percentage changes in parameters after treatment from baseline (∆f 02 th% ) and from interim (∆f 12 th% ) in order to evaluate their power in assessment of interim and EOT response.
Since none of the patients showed PMD or SMD at interim, and the population size for patients with PMD at the EOT was too small to perform statistical analysis (only two patients), we performed analysis for prediction of response to treatment only comparing patients with CMR and those with PMR. Specifically, we compared CMR and PMR patients at interim for prediction of interim response to treatment, and CMR and PMR patients at the EOT for prediction of EOT response.
All statistical analyses were performed using MATLAB (R2018a, MathWorks, Inc., Nettie, MA, USA). A p-value less than 0.05 was considered to indicate a statistically significant difference.

Response to Therapy (Lugano Assessment)
Relating to the assessment of tumor response at interim, 14 patients (70%) showed CMR, while the remaining 6 patients (30%) had PMR. Relating to the assessment of tumor response at the EOT, 15 patients (75%) showed CMR, 3 (15%) had PMR, and 2 (10%) had PMD (see Table 3). In Figure 2, maximum intensity projection (MIP) of PET images at baseline, at interim and at the EOT for a patient showing CMR both at T1 and T2 are shown. Table 3. Results of response to therapy according to Lugano assessment. Reported data are number of patients and percentages are in parenthesis.

Prediction of Response to Treatment
The Kolmogorov-Smirnov test revealed that all baseline parameters, according to both T1 and T2 response, were non-Gaussian. Using Mann-Whitney U test, none of the baseline parameters was useful for prediction of interim response to therapy. Refer to Supplementary Table S1 for median values, interquartile ranges and associated p-values of all parameters, relating to prediction of interim response to treatment. Conversely, referring to prediction of EOT response, DV 0 40% and DV 0 60% were significantly higher in PMR patients than in CMR patients, while ADC5p 0 40% and ADC95p 1 20% were significantly lower in PMR patients than in CMR patients. Refer to Table 4 for median values, p-values and associated ROC analysis statistics. See Supplementary Materials for median values, interquartile ranges and associated p-values of all parameters, relating to prediction of EOT response (Tables S2 and  S3) and for boxplots and ROC curves of significant features (Figures S1-S4).

Assessment of Response to Treatment
Kolmogorov-Smirnov test revealed that all interim and EOT parameters, according to both T1 and T2 response, were non-Gaussian. Using Mann-Whitney U test for assessment of interim response to treatment, values for ADCmean 1 20% , ADCsd 1 20% , and ADC95p 1 at 20% and 40% were significantly lower in PMR patients than in CMR patients. For thresholds from 5% to 40%, ∆ADCsd 01 was found to be significantly higher in CMR patients than in PMR patients. The same trend was observed for ∆ADC95p 01 at threshold of 5%, 20%, 40% and 60% and for ∆ADCentr 01 at thresholds from 5% to 40%. Conversely, values at interim of DV at 40% showed a completely opposite trend. In Table 5, median values and associated ROC analysis statistics for discrimination between CMR and PMR patients are reported. See Supplementary Tables S4 and S5 for median values, interquartile ranges, and associated p-values of all parameters relating to assessment of response to treatment at interim, and Figures S5-S22 for boxplots and ROC curves of significant features.   On the other hand, relating to assessment of EOT response, values at T2 of ADC mean and median at 5% threshold, ∆ADCsd 02 5% , ∆ADC5p 02 60% , and ∆ADCsd 12 5% were significantly higher in PMR than in CMR patients. Conversely, ∆DV 02 40% and ∆ADCsd 02 60% revealed a completely opposite trend. Refer to Table 6 for median values, p-values, and associated ROC analysis statistics for discrimination between CMR and PMR patients. See Supplementary Materials for median values, interquartile ranges, and associated p-values of all parameters, relating to assessment of response to treatment at interim (Tables S6 and S7) and for boxplots and ROC curves of significant features (Figures S23-S29).

Discussion
The lack of validation and standardization represents the main drawback for a clear definition of the role of WB-DWI in lymphoma diagnosis, staging, and response assessment [17][18][19][20]. In the present preliminary study, we investigated a new WB-DWI-based approach for prediction and assessment of lymphoma response to treatment through the analysis of quantitative WB-DWI histogram features extracted from MR Total Tumor Load tool at six segmentation thresholds (5%, 10%, 20%, 40%, 60%, 80%) and using Lugano criteria applied on PET/CT images as reference standard.
Results of our study revealed the inability of all examined parameters to predict interim response. Concerning prediction of EOT response, DV at 40% and 60% was found to be significantly higher in PMR patients than in CMR patients. This could be related to the poorer response to treatment of PMR patients, that is normally associated with a higher diffusion tumor volume associated to DWI signal intensity [14,17]. Relating to ADC histogram variables, values at baseline of the 5th percentile of ADC at 40% and those of ADC skewness at 20% were respectively significantly lower and higher in PMR than in CMR patients and were found to be able to predict EOT response. Interim value of 95th percentile of ADC at 20% was also found to be useful for this purpose.
Interim value of DV at 40% was also useful for the assessment of interim response, as well as the following histogram related features: values at interim of ADC mean, standard deviation, and entropy at 20% threshold, and values of ADC 95th percentile at 5%, 20%, 40%, and 60%, were significantly lower in PMR than in CMR. The above-mentioned findings on ADC mean are consistent with the definition of PMR patients (which should have an overall lower ADC mean value than CMR patients) and in line with those obtained in the case study reported by Tsiflikas et al. [14] on a 14-year girl with HL, as well as in previous studies involving other tumor types and using MR Total Tumor Load Tool [4,10,12]. Percentage change between T1 and T0 in ADC standard deviation at 5%, 10%, 20%, and 40% was significantly lower in PMR patients than in CMR patients, and it was due to a completely opposite trend: in PMR patients there was a decrease in ADC standard deviation from T0 to T1, while, on the counter, in CMR patients this feature increased from T0 to T1. Same behavior was observed for percentage change between T1 and T0 in ADC 95th percentile at 20% and 40%, and in ADC entropy from 5% to 40% threshold.
Concerning the assessment of EOT response, mean and median values of ADC at the EOT were found to be significantly higher in PMR than in CMR patients. Significant results for assessment of EOT response were also achieved by percentage change between T2 and T0 in DV at 40% and in ADC 5th percentile at 60%, and by percentage change between T2 and T1 in ADC standard deviation at 5%.
Obtained results support the theory that DV and its related histogram-based ADC statistics could be useful in prediction and assessments of HL response to therapy, as observed in previous mentioned studies concerning different oncologic diseases and using MR Total Tumor Load Tool [10,12,14].
However, our results cannot be directly compared with any of these, since in our study, DV and its relative ADC histogram parameters were associated to a diffusion volume mask automatically generated by the tool, and so not manually segmented.
In fact, the innovation in our research is that we tried to draw conclusions skipping the mask editing step expected by the tool, which would require the intervention of an expert operator able to exclude normally hyperintense and not tumoral regions, and directly extract the features associated to the unrefined mask.
Our proposed method would surely offer huge advantages in terms of saving time and work, enabling also less expert operator to draw conclusions relating to lymphoma diagnosis. Moreover, being based on WB-DWI technique, it also encloses all DWI advantages compared to CT and PET, such as the absence of ionizing radiation, the fast acquisition of images and the no-requirement for contrast injection [21,22]. However, at the same time, it adds substantial limitations to our study. First of all, artifact regions and/or normal hyperintense regions were incorporated in the automatically segmented mask. For example, organs such as brain, kidneys, and spleen, are usually hyperintense. Moreover, due to its frequent activation due to chemotherapy, the bone marrow also appeared hyperintense, and this could justify an increase in diffusion volume during the chemotherapeutic treatment.
Limitations related to the gold standard are that the 5-PS and the subsequent Lugano assessment are a qualitative gold standard based on visual assessment, which results influenced by inter-observer variability due to the subjectivity of the interpretation [23]. We could have been accompanied this qualitative assessment with a semiquantitative one based on Standardized Uptake Value (SUV) values.
In order to mitigate this limitation, Lugano assessment was performed in consensus between radiologist and nuclear medicine physician.
Furthermore, variations in time interval between the three image acquisitions and the different stage at T0 among patients may have influenced results.
Finally, the patient sample involved was small and unbalanced, and this might have adversely affected statistical results. Moreover, it should be considered that the retrospective study nature characterizing our study is supposed to have more bias and should be validated through prospective studies [24]. Despite this, the novelty of our method may provide a basis for future retrospective and prospective studies involving more participants. More informative results may be obtained if more patients could be investigated, also considering that, only three patients were found to have PMR response at EOT, making results on prediction and assessment of EOT response imprecise and unreliable. Moreover, only two patients were found to have PMD and no one SMD, not allowing us to perform comparisons among these categories.
It could be interesting to evaluate PET images using the same method, and compare them with WB-DWI images, both in terms of histogram parameters and relating to masked total volume, respectively, total diffusion volume (DV) in WB-DWI and total metabolic volume in PET), and investigate how unrefined diffusion volume masks extracted by the tool are related with SUV values associated with PET of PET/MRI. Following this line, it would be attractive to investigate the utility of the unrefined total metabolic volume automatically extracted by the tool for prediction and assessment of response to treatment in lymphoma patients, as done by Cottereau et al. [25,26] considering the total metabolic tumor volume (TMTV). Moreover, it would be interesting to integrate imaging information with those arising from lymph node biopsy (which is the gold standard for diagnosing lymphoma) and also from liquid biopsy markers, in order to choose a tailored treatment strategy for each HL patient and better evaluate treatment efficacy [27][28][29].

Conclusions
In conclusion, in this preliminary study, we found that several WB-DWI associated features allowed for prediction of interim response and both prediction and assessment of EOT response of patients with HL. However, the novelty of our method of feature extraction, with its related restrictions, and the lack of a defined and standardized role of DWI for the management of HL, pave the way for further studies involving larger groups of patients, which are essential to investigate the effective impact of our method and validate obtained results.