Sources of variation in multicenter rectal MRI data and their effect on radiomics feature reproducibility

Schurink, Niels W.; van Kranen, Simon R.; Roberti, Sander; van Griethuysen, Joost J. M.; Bogveradze, Nino; Castagnoli, Francesca; el Khababi, Najim; Bakers, Frans C. H.; de Bie, Shira H.; Bosma, Gerlof P. T.; Cappendijk, Vincent C.; Geenen, Remy W. F.; Neijenhuis, Peter A.; Peterson, Gerald M.; Veeken, Cornelis J.; Vliegen, Roy F. A.; Beets-Tan, Regina G. H.; Lambregts, Doenja M. J.

doi:10.1007/s00330-021-08251-8

Sources of variation in multicenter rectal MRI data and their effect on radiomics feature reproducibility

Imaging Informatics and Artificial Intelligence
Open access
Published: 16 October 2021

Volume 32, pages 1506–1516, (2022)
Cite this article

Download PDF

You have full access to this open access article

European Radiology Aims and scope Submit manuscript

Sources of variation in multicenter rectal MRI data and their effect on radiomics feature reproducibility

Download PDF

Niels W. Schurink^1,2,
Simon R. van Kranen³,
Sander Roberti⁴,
Joost J. M. van Griethuysen^1,2,
Nino Bogveradze^1,2,5,
Francesca Castagnoli¹,
Najim el Khababi^1,2,
Frans C. H. Bakers⁶,
Shira H. de Bie⁷,
Gerlof P. T. Bosma⁸,
Vincent C. Cappendijk⁹,
Remy W. F. Geenen¹⁰,
Peter A. Neijenhuis¹¹,
Gerald M. Peterson¹²,
Cornelis J. Veeken¹³,
Roy F. A. Vliegen¹⁴,
Regina G. H. Beets-Tan^1,2 &
…
Doenja M. J. Lambregts¹

2613 Accesses
19 Citations
1 Altmetric
Explore all metrics

This article has been updated

Abstract

Objectives

To investigate sources of variation in a multicenter rectal cancer MRI dataset focusing on hardware and image acquisition, segmentation methodology, and radiomics feature extraction software.

Methods

T2W and DWI/ADC MRIs from 649 rectal cancer patients were retrospectively acquired in 9 centers. Fifty-two imaging features (14 first-order/6 shape/32 higher-order) were extracted from each scan using whole-volume (expert/non-expert) and single-slice segmentations using two different software packages (PyRadiomics/CapTk). Influence of hardware, acquisition, and patient-intrinsic factors (age/gender/cTN-stage) on ADC was assessed using linear regression. Feature reproducibility was assessed between segmentation methods and software packages using the intraclass correlation coefficient.

Results

Image features differed significantly (p < 0.001) between centers with more substantial variations in ADC compared to T2W-MRI. In total, 64.3% of the variation in mean ADC was explained by differences in hardware and acquisition, compared to 0.4% by patient-intrinsic factors. Feature reproducibility between expert and non-expert segmentations was good to excellent (median ICC 0.89–0.90). Reproducibility for single-slice versus whole-volume segmentations was substantially poorer (median ICC 0.40–0.58). Between software packages, reproducibility was good to excellent (median ICC 0.99) for most features (first-order/shape/GLCM/GLRLM) but poor for higher-order (GLSZM/NGTDM) features (median ICC 0.00–0.41).

Conclusions

Significant variations are present in multicenter MRI data, particularly related to differences in hardware and acquisition, which will likely negatively influence subsequent analysis if not corrected for. Segmentation variations had a minor impact when using whole volume segmentations. Between software packages, higher-order features were less reproducible and caution is warranted when implementing these in prediction models.

Key Points

• Features derived from T2W-MRI and in particular ADC differ significantly between centers when performing multicenter data analysis.

• Variations in ADC are mainly (> 60%) caused by hardware and image acquisition differences and less so (< 1%) by patient- or tumor-intrinsic variations.

• Features derived using different image segmentations (expert/non-expert) were reproducible, provided that whole-volume segmentations were used. When using different feature extraction software packages with similar settings, higher-order features were less reproducible.

The impact of pre-processing and disease characteristics on reproducibility of T2-weighted MRI radiomics features

Article Open access 09 August 2023

Acquisition repeatability of MRI radiomics features in the head and neck: a dual-3D-sequence multi-scan study

Article Open access 01 April 2022

Reproducibility of radiomic features in CT images of NSCLC patients: an integrative analysis on the impact of acquisition and reconstruction parameters

Article Open access 25 January 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Over the past decade, more than 100 papers have been published on the use of MR imaging biomarkers to predict various clinical outcomes in rectal cancer such as treatment response and survival [1,2,3]. Imaging biomarkers range from relatively simple measures (tumor size and volume) [1, 2] to “functional” measures derived from imaging sequences such as diffusion-weighted imaging (DWI) and dynamic contrast-enhanced MRI[4]. More recently, the focus of research has shifted towards more advanced post-processing techniques such as radiomics used to extract large numbers of quantitative features to construct a radiological phenotype of the studied lesion [2, 5]. Common radiomics features include “first-order” histogram features (e.g., mean, standard deviation), shape features (e.g., volume, sphericity), and more complex higher-order texture features (e.g., gray-level co-occurrence matrix features) that describe patterns within the image.

While imaging biomarker studies have shown promising results to predict oncologic outcomes, several authors have voiced concern about the poor reproducibility and repeatability of these studies [6,7,8], related to small/underpowered single-center study designs, lack of independent model validation, and poor reproducibility of imaging features [6,7,8]. Important factors affecting reproducibility are data variations introduced by differences in acquisition, post-processing, and statistical analysis [9]. This is especially relevant for multicenter studies where data is generated using different hardware, software, and acquisition protocols, and where data is often evaluated by different readers. These variations are often referred to as “center effects” [10] and defined as “non-biological systematic differences between measurements of different batches of experiments” [11] that can negatively affect the performance of multicenter models [12].

Studies investigating sources of variation in imaging data have so far mainly focused on CT and PET and only one of 35 studies in a systematic review on radiomics feature reproducibility focused on MRI [9]. Some recent studies have explored variations in quantitative MRI analysis, though mainly in phantoms [13,14,15,16] or small (< 48 patients) single-center [13, 17, 18] or bi-institutional [19] patient cohorts. The current study aimed to add to these previous data by analyzing a large representative sample of rectal MRIs acquired at multiple institutions in the Netherlands to gain insight into how variations in “real life” clinical MRI data can affect radiomics studies. In specific, the goal was to investigate sources of variation focusing on hardware, image acquisition, and effects of post-processing related to segmentation methodology and feature extraction software.

Materials and methods

Patients

For this retrospective study, we analyzed a dataset of rectal MRI scans (scanned between 2012 and 2017) previously collected as part of an ongoing IRB-approved retrospective multicenter study on prediction of response to neoadjuvant treatment, including patients from nine different centers in the Netherlands (1 tertiary oncologic referral center, 1 academic and 7 non-academic centers). Inclusion criteria for this previous study were (1) biopsy-proven rectal adenocarcinoma, (2) neoadjuvant treatment (chemoradiotherapy or 5 × 5 Gy radiotherapy with a long waiting interval) followed by surgery or watch-and-wait (W&W), (3) availability of baseline staging MRI (including T2W-MRI and DWI), and (4) availability of clinical outcome to establish response. From this initial cohort of 742 patients, 93 were excluded for reasons detailed in the in-/exclusion flowchart in Fig. 1, leaving a total study population of 649 patients. The overall study methodology is illustrated in Fig. 2.

Imaging and image processing steps

All images were acquired according to routine practice in the participating centers using various vendors and acquisition protocols. The transverse T2W-MRI and apparent diffusion coefficient (ADC) maps were selected for analysis, as these were most commonly reported in previous rectal cancer image biomarker studies [2]. ADC maps were calculated from the DWI-series with a mono-exponential fit of the signal intensity using all available b-values (varying from 2 to 7 b-values per sequence; b-values ranging between b0 and b2000). Negative ADC values (< 0) or ADC values larger than 3 standard deviations from the mean (> mean + 3SD) were marked invalid. As T2W pixel values are represented on an arbitrary scale, these images were normalized to mean = 0 and standard deviation = 100. All images were then resampled to a common pixel spacing of 2 × 2 × 2 mm.

To explore the effects of segmentation methodology, three types of tumor segmentations were generated using 3D-slicer (version-4.10.2). Segmentations were generated on high b-value DWI, using the T2W-MRI as an anatomical reference, and then copied to the T2W-MRI and ADC maps. First, a non-expert reader (resident-level with no specific expertise in reading rectal MRI) segmented the rectal tumors by applying the “level-tracing” algorithm on DWI and manually adjusting it to exclude obvious artefacts or non-tumor tissues (e.g., adjacent organs or lymph nodes). Second, a board-certified radiologist (with > 10 years of experience in rectal MRI) manually revised these segmentations, taking care to precisely delineate the tumor boundaries slice-by-slice. Third, a single-slice segmentation was derived from this expert segmentation including the axial slice with the largest tumor surface area. These three segmentations will be further referred to as follows: (1) non-expert, (2) expert, and (3) single-slice segmentation.

Imaging features were extracted using PyRadiomics (version-v3.0). To explore the effects of feature extraction methodology, features (for the whole-volume expert-segmentations) were additionally extracted with similar software settings using a different open-source software package, CapTk (version-1.8.1). Only features defined in both software packages were extracted, including 52 features in total: 14 first-order, 6 shape, and 32 higher-order (7 gray-level co-occurrence matrix (GLCM), 16 gray-level size zone matrix (GLSZM), 4 gray-level run-length matrix (GLRLM), and 5 neighboring gray-tone difference matrix (NGTDM)).

Analysis of sources of variation

Center variations (case-mix, hardware, and image acquisition)

To investigate potential effects of “case-mix” differences, baseline patient characteristics were compared between centers using the Kruskal–Wallis test for age, T-stage, and N-stage, and chi-squared test for sex.

As a first exploratory step, we derived 6 basic imaging features (minimum, maximum, mean, standard deviation, entropy, and tumor volume) for each patient for both the T2W-MRI and ADC maps. The distribution of these features within our cohort was then visualized for each center separately using notched boxplots. To test whether the medians of the derived features were significantly different between patients from different centers we used the Kruskal–Wallis; to identify which specific centers have different feature distributions, a post hoc pairwise Mann–Whitney U-test was performed with Bonferroni correction to account for multiple testing. Supplementary Materials 1 describes a sub-analysis exploring whether differences between centers can be harmonized retrospectively by adjusting the b-values for ADC calculation or by performing data normalization using reference organs or z-transformation.

Using multivariable linear regression, we further explored the effects of variations in hardware (vendor/scanner model, field strength) and acquisition parameters (slice thickness, acquired in-plane resolution, repetition time, echo time, number of signal averages, maximum b-value, number of b-values, signal-to-noise ratio) on ADC and compared these to various patient-intrinsic (baseline and clinical outcome) parameters previously reported to be correlated with ADC (including sex, age, cT-stage, cN-stage, response to chemoradiotherapy and tumor volume [1, 2, 20]). “Center” (i.e., hospital) was investigated as a final parameter to account for unknown variations not covered by the other variables (e.g., patient preparation and undocumented acquisition parameters). Analyses were performed using R version-3.6.1, and p values < 0.05 were considered statistically significant. Further details on the regression analysis are provided in Supplementary Materials 2.

Image segmentation

Imaging features were compared between the expert, non-expert, and single-slice segmentations using the two-way absolute agreement intra-class correlation coefficient (ICC), with ICC < 0.50 indicating poor agreement, 0.50 ≤ ICC < 0.75 moderate agreement, 0.75 ≤ ICC < 0.90 good agreement, and ICC > 0.90 excellent agreement [21].

Feature extraction software

Imaging features derived with PyRadiomics (using expert segmentations) were compared to those derived using CapTk using the two-way absolute agreement ICC and the same cut-offs for agreement detailed above [21].

Results

Sources of variation

Center variations (patient-mix, hardware, and image acquisition)

Baseline characteristics of the 649 study patients (417 male, median age 65 years) are provided in Table 1. There were no significant differences in cT-stage, age, and sex distribution between the nine centers (p = 0.11–0.69), except for cN-stage that was significantly higher in one center (p < 0.001). An overview of the main variations in hardware and acquisition protocols is provided in Table 2. The distribution of basic first-order feature values per center and post hoc analyses illustrating differences between individual centers are depicted in Fig. 3. All tested features differed significantly between centers (Kruskal–Wallis p < 0.001) on both T2W-MRI and ADC. Pairwise comparisons between individual centers revealed that mainly ADC mean, minimum and maximum showed significant differences between the majority of the centers, while for T2W-MRI features, and ADC standard deviation and entropy, differences were limited to 2–4 individual centers. Data variations between centers did not improve after b-value harmonization and remained significant after applying different retrospective normalization methods, though normalization using inguinal lymph nodes as a reference organ did have a positive effect in reducing data variations as outlined in Supplementary Materials 1. Tumor volumes were mostly comparable between centers and only differed significantly between a minority of individual centers.

Table 1 Baseline patient characteristics and variations between centers

Full size table

Table 2 Overview of main variations in hardware and acquisition protocols between the 9 participating centers

Full size table

The results of the regression models developed to investigate the influence of hardware, acquisition, and patient-intrinsic (baseline and clinical outcome) parameters on mean tumor ADC are reported in Table 3. Acquisition parameters had the strongest association with ADC and on their own were able to explain 64.3% of the variation in ADC present in the data. Patient-intrinsic parameters (e.g., age, gender, TN-stage, treatment response) had a negligible effect on ADC and were able to explain only 0.4% of the variation in ADC. The umbrella variable “Center” was able to explain 32.5% of the variation in ADC. When combining all factors in one model, the model explained 63.5% of the data variation, with acquisition and hardware parameters as the main predictors.

Table 3 Factors attributing to mean tumor ADC

Full size table

Image segmentation

The results of the reproducibility analysis using different segmentation strategies are depicted in Fig. 4A. Reproducibility between expert and non-expert segmentations was good–excellent for the majority of features (first-order, shape, and higher-order) with ICC values ranging between 0.72 and 0.99 (median 0.90) for T2W-MRI and 0.53 and 0.99 (median 0.89) for ADC. Compared to the expert whole-volume segmentations, the extracted single-slice segmentations resulted in considerably lower reproducibility with an ICC of 0.00–0.94 (median 0.40) for T2W-MRI and ICC of 0.00–0.97 (median 0.58) for ADC, with poor results for shape, GLSZM, and NGTDM features.

Feature extraction software

The influence of feature extraction software is depicted in Fig. 4B. The majority of first-order, shape, GLCM, and GLRLM features showed good–excellent reproducibility with similar results for features derived from T2W-MRI or ADC with ICCs ranging between 0.00 and 1.00 (median 0.99) for both modalities. In contrast, the majority of GLSZM and NGTDM features were poorly reproducible with ICCs of 0.00–0.56 (median 0.00) for T2W-MRI and ICCs 0.01–0.99 (median 0.41) for ADC.

Discussion

The results of this study show that variations in quantitative imaging (radiomics) in a large clinical multicenter dataset of rectal MRIs were more substantial for DWI/ADC than for T2W-MRI and mainly related to hardware and acquisition protocols (i.e., “center effects”). The effects of segmentation methodology and feature extraction software on feature variation were less significant, particularly for the more basic first-order and shape-related features that showed overall good reproducibility.

An exploratory analysis on the feature distribution of 6 basic imaging features (first-order + volume) showed significant variation between patient populations coming from different centers, with more significant differences for ADC than for T2W-MRI. Tumor volume was the most robust feature with the most comparable results between centers. This is in line with a previous report on the repeatability of MRI features in a small cohort (n = 48) of patients with brain glioblastoma, which showed that shape features (including volume) resulted in higher repeatability than features derived from T2W-MRI pixel intensities [18]. Similarly, shape features were found to have the highest repeatability and reproducibility on T2W-MRI of cervical cancer [22].

Since ADC data showed the largest variations, we developed a linear regression model to investigate in-depth which factors influence mean tumor ADC. The majority (> 60%) of the ADC variation could be predicted using only hardware and acquisition-related parameters while patient-intrinsic (clinical and tumor) features alone predicted only 0.4% of the variation in ADC. This suggests that—when building multicenter prediction models—any potential relation between clinical outcomes and ADC will likely be obscured when no correction is performed to account for acquisition and hardware differences. This is a very relevant issue when incorporating ADC into retrospective multicenter studies without protocol standardization. Even in controlled prospective study designs with harmonized acquisition protocols, variation in ADC can still be a limiting factor as coefficients of variation range as high as 4–37% depending on the measured organ [23, 24]. Differences in acquisition settings have previously also been shown to substantially reduce inter- and intra-scanner reproducibility of T2W-MRI radiomics features, with particularly poor results for higher-order features [13, 16].

Several methods have been suggested to correct for data variations between centers. A first option is to discard features that are poorly reproducible across centers, with the advantage of creating simpler models (though with the drawback of losing potentially valuable information) [12]. Two alternative options are normalization in the image domain (e.g., z-transform or within patient normalization using reference tissues) or feature domain (e.g., ComBat harmonization). These techniques have been shown to significantly improve T2W-MRI feature reproducibility [18] and to successfully correct for “batch-effects” (similar to “center-effects”) in genomic studies [12]. The latter approach has recently been adopted for radiomics with promising results [25]. In our exploratory analysis described in Supplementary Materials 1, data normalization using lymphoid tissue (benign inguinal lymph nodes) as a reference organ had a positive effect to reduce ADC data variations between centers, though differences remained statistically significant and the benefits of this approach will need to be further investigated in studies where features are tested against a clinical outcome. The fourth option is to use statistical models that specifically take center effects into account (e.g., random/mixed-effect models [10]). These various options all have their strengths and weaknesses and evidence-based guidelines on the preferred (combination of) methods to handle center effects in multicenter radiomics research are so far lacking and urgently needed.

Regarding image segmentation methodology, we found poor reproducibility for higher-order features (e.g., GLSZM and GLRLM) but overall good reproducibility for simpler features (e.g., first-order, GLCM) similar to previous reports [9, 26, 27]. The features derived from single-slice segmentations showed the poorest reproducibility, which is in line with previous single-center reports [28, 29], indicating that—though less cumbersome—single-slice methods are not recommendable. Interestingly, feature reproducibility was good–excellent between expert and non-expert readers, indicating that input from expert-radiologists is not necessarily required. This is in line with a previous report where a Radiomics model was trained to predict response to chemoradiotherapy in rectal cancer and achieved similar performance regardless of whether segmentations were performed by expert (AUC 0.67–0.83) or non-expert readers (AUC 0.69–0.79) [30]. This is reassuring given the tremendous workload associated with image segmentation, especially when analyzing large volumes of imaging data. Another potential solution to reduce this workload could be to use computer algorithms to (semi-)automatically generate tumor segmentations [31]. There have been some promising reports showing that computer algorithms may generate segmentations similar to manual tumor delineations [31, 32], provided that image quality is good [33].

When comparing feature reproducibility using different software packages, we found that the majority of first-order, shape, GLCM, and GLRLM features showed good to excellent reproducibility whereas the majority of higher-order features (GLSZM and NGTDM) were poorly reproducible. This is in line with earlier findings that higher-order features are generally less reproducible than first-order and shape features [9] which can probably be partly attributed to technical differences in the implementation of features and/or image processing by different software and computational algorithms. This underlines the importance of accurately reporting software versions, and preferably using packages with standardized feature definitions such as those defined by the image biomarker standardization initiative (IBSI). Considering the poor reproducibility of the higher-order features in our dataset, caution should be taken when incorporating more advanced features into clinical prediction models.

The main novelty of the current study lies in its multicenter aspect. Although previous studies have already identified acquisition parameters [13, 15, 16], segmentation [19], and post-processing methods [15, 18] as factors affecting feature reproducibility, these studies have so far mainly been performed on non-MRI data or in small patient cohorts or phantoms. The extent to which these effects influence feature reproducibility and may obscure correlations with common clinical outcomes in a representative “real life” clinical cohort of MRI data acquired at various institutions has not been previously reported. There were, however, some limitations to our study design in addition to its retrospective nature. As the data was acquired and anonymized to comply with privacy regulations, only basic acquisition information could be extracted from the DICOM headers. Other potential sources of variation, such as coil use, fat suppression, MRI software version, and patient preparation, therefore, remain underexposed. In addition, all segmentations were done on the high b-value DWI (and then copied to the other modalities). Although care was taken to take the anatomical information of T2W-MRI into account during the segmentations, ideally a separate segmentation would have been performed on T2W-MRI. This, however, was not feasible to accomplish within an acceptable timeframe. Finally, several data-processing choices (e.g., resampling voxel size, bin-width, gray-level discretization, and T2W-MRI normalization) were made which may have influenced the extracted features [34,35,36]. Although some of these steps may have introduced some bias in our analyses a more detailed analysis on the impact of these choices was beyond the scope of this paper.

In conclusion, this study has shown that significant variations between centers are present in multicenter rectal MRI data with more substantial variations in DWI/ADC compared to T2W-MRI, which are mainly related to hardware and image acquisition protocols (i.e. “center effects”). These effects need to be accounted for when analyzing multicenter MRI datasets to avoid overlooked potential correlations with the clinical outcome under investigation. Image segmentation has relatively minor effects on image quantification provided that whole-volume segmentations are performed. Expert segmentation input is not necessarily required to acquire stable features, which could shift the daunting task of image segmentation from expert-radiologists to less experienced readers or even (semi-)automatic software algorithms. Higher-order features were less reproducible between software packages and caution is therefore warranted when implementing these into clinical prediction models.

Change history

10 March 2022
The name of the author Najim el Khababi has been tagged incorrectly in the HTML version of this article. The author's first name is Najim and the family name is el Khababi.

Abbreviations

GLCM:: Gray-level co-occurrence matrix
GLRLM:: Gray-level run-length matrix
GLSZM:: Gray-level size zone matrix
Gy:: Gray
ICC:: Intra-class correlation coefficient
NGTDM:: Neighboring gray-tone difference matrix
W&W:: Watch-and-wait

References

Schurink NW, Lambregts DMJ, Beets-Tan RGH (2019) Diffusion-weighted imaging in rectal cancer: current applications and future perspectives. Br J Radiol 92:20180655
Article Google Scholar
Di Re AM, Sun Y, Sundaresan P et al (2021) MRI radiomics in the prediction of therapeutic response to neoadjuvant therapy for locoregionally advanced rectal cancer: a systematic review. Expert Rev Anticancer Ther 21:425–449
Article Google Scholar
Staal FCR, van der Reijd DJ, Taghavi M et al (2021) Radiomics for the prediction of treatment outcome and survival in patients with colorectal cancer: a systematic review. Clin Colorectal Cancer 20:52–71
Article Google Scholar
Pham TT, Liney GP, Wong K, Barton MB (2017) Functional MRI for quantitative treatment response prediction in locally advanced rectal cancer. Br J Radiol 90:20151078
Article Google Scholar
Lambin P, Rios-Velazquez E, Leijenaar R et al (2012) Radiomics: extracting more information from medical images using advanced feature analysis. Eur J Cancer 48:441–446
Article Google Scholar
Wright BD, Vo N, Nolan J et al (2020) An analysis of key indicators of reproducibility in radiology. Insights Imaging 11:65
Article Google Scholar
Song J, Yin Y, Wang H et al (2020) A review of original articles published in the emerging field of radiomics. Eur J Radiol 127:108991
Article Google Scholar
Aerts HJWL (2018) Data science in radiology: a path forward. Clin Cancer Res 24:532–534
Article Google Scholar
Traverso A, Wee L, Dekker A, Gillies R (2018) Repeatability and reproducibility of radiomic features: a systematic review. Int J Radiat Oncol 102:1143–1158
Article Google Scholar
Kahan BC (2014) Accounting for centre-effects in multicentre trials with a binary outcome – when, why, and how? BMC Med Res Methodol 14:20
Article Google Scholar
Luo J, Schumacher M, Scherer A et al (2010) A comparison of batch effect removal methods for enhancement of prediction performance using MAQC-II microarray gene expression data. Pharmacogenomics J 10:278–291
Article CAS Google Scholar
Da-Ano R, Visvikis D, Hatt M (2020) Harmonization strategies for multicenter radiomics investigations. Phys Med Biol 65:24TR02
Article CAS Google Scholar
Mi H, Yuan M, Suo S et al (2020) Impact of different scanners and acquisition parameters on robustness of MR radiomics features based on women’s cervix. Sci Rep 10:20407
Article CAS Google Scholar
Baeßler B, Weiss K, Pinto dos Santos D (2019) Robustness and reproducibility of radiomics in magnetic resonance imaging. Invest Radiol 54:221–228
Article Google Scholar
Ammari S, Pitre-Champagnat S, Dercle L et al (2021) Influence of magnetic field strength on magnetic resonance imaging radiomics features in brain imaging, an in vitro and in vivo study. Front Oncol 10:1–11
Article Google Scholar
Yuan J, Xue C, Lo G et al (2021) Quantitative assessment of acquisition imaging parameters on MRI radiomics features: a prospective anthropomorphic phantom study using a 3D–T2W-TSE sequence for MR-guided-radiotherapy. Quant Imaging Med Surg 11:1870–1887
Article Google Scholar
Gourtsoyianni S, Doumou G, Prezzi D et al (2017) Primary rectal cancer: repeatability of global and local-regional mr imaging texture features. Radiology 284:552–561
Article Google Scholar
Hoebel KV, Patel JB, Beers AL et al (2021) Radiomics repeatability pitfalls in a scan-rescan MRI study of glioblastoma. Radiol Artif Intell 3:e190199
Article Google Scholar
Traverso A, Kazmierski M, Shi Z et al (2019) Stability of radiomic features of apparent diffusion coefficient (ADC) maps for locally advanced rectal cancer in response to image pre-processing. Phys Med 61:44–51
Attenberger UI, Pilz LR, Morelli JN et al (2014) Multi-parametric MRI of rectal cancer - do quantitative functional MR measurements correlate with radiologic and pathologic tumor stages? Eur J Radiol 83:1036–1043
Article CAS Google Scholar
Perinetti G (2018) StaTips Part IV: Selection, interpretation and reporting of the intraclass correlation coefficient. South Eur J Orthod Dentofac Res 5:3–5
Google Scholar
Fiset S, Welch ML, Weiss J et al (2019) Repeatability and reproducibility of MRI-based radiomic features in cervical cancer. Radiother Oncol 135:107–114
Article Google Scholar
Michoux NF, Ceranka JW, Vandemeulebroucke J et al (2021) Repeatability and reproducibility of ADC measurements: a prospective multicenter whole-body-MRI study. Eur Radiol 31:4514–4527
Article Google Scholar
Donati OF, Chong D, Nanz D et al (2014) Diffusion-weighted MR imaging of upper abdominal organs: field Strength and intervendor variability of apparent diffusion coefficients. Radiology 270:454–463
Article Google Scholar
Orlhac F, Lecler A, Savatovski J et al (2021) How can we combat multicenter variability in MR radiomics? Validation of a correction procedure. Eur Radiol 31:2272–2280
Article Google Scholar
Haarburger C, Schock J, Truhn D, et al (2019) Radiomic feature stability analysis based on probabilistic segmentations
Lee J, Steinmann A, Ding Y et al (2021) Radiomics feature robustness as measured using an MRI phantom. Sci Rep 11:3973
Article CAS Google Scholar
Lambregts DMJ, Beets GL, Maas M et al (2011) Tumour ADC measurements in rectal cancer: effect of ROI methods on ADC values and interobserver variability. Eur Radiol 21:2567–2574
Article Google Scholar
Nougaret S, Vargas HA, Lakhman Y et al (2016) Intravoxel incoherent motion–derived histogram metrics for assessment of response after combined chemotherapy and radiation therapy in rectal cancer: initial experience and comparison between single-section and volumetric analyses. Radiology 280:446–454
Article Google Scholar
van Griethuysen JJM, Lambregts DMJ, Trebeschi S et al (2020) Radiomics performs comparable to morphologic assessment by expert radiologists for prediction of response to neoadjuvant chemoradiotherapy on baseline staging MRI in rectal cancer. Abdom Radiol (NY) 45:632–643
Trebeschi S, van Griethuysen JJM, Lambregts DMJ et al (2017) Deep learning for fully-automated localization and segmentation of rectal cancer on multiparametric MR. Sci Rep 7:5301
Article Google Scholar
Van Heeswijk MM, Lambregts DMJ, van Griethuysen JJM et al (2016) Automated and semiautomated segmentation of rectal tumor volumes on diffusion-weighted MRI: can it replace manual volumetry? Int J Radiat Oncol Biol Phys 94:824–831
Article Google Scholar
Van Griethuysen J, Schurink N, Lahaye MJ, et al (2020) Deep learning for fully automated segmentation of rectal tumours on MRI in a multicentre setting. In: ESGAR 2020 Book of Abstracts. Insights Imaging, 11:64
Li Q, Bai H, Chen Y et al (2017) A fully-automatic multiparametric radiomics model: towards reproducible and prognostic imaging signature for prediction of overall survival in glioblastoma multiforme. Sci Rep 7:14331
Article Google Scholar
Duron L, Balvay D, Vande Perre S et al (2019) Gray-level discretization impacts reproducible MRI radiomics texture features. PLoS One 14:e0213459
Article CAS Google Scholar
Isaksson LJ, Raimondi S, Botta F et al (2020) Effects of MRI image normalization techniques in prostate cancer radiomics. Phys Med 71:7–13

Download references

Funding

This study has received funding from the Dutch Cancer Society (project number 10138).

Author information

Authors and Affiliations

Department of Radiology, The Netherlands Cancer Institute, POB 90203, 1006 BE, Amsterdam, The Netherlands
Niels W. Schurink, Joost J. M. van Griethuysen, Nino Bogveradze, Francesca Castagnoli, Najim el Khababi, Regina G. H. Beets-Tan & Doenja M. J. Lambregts
GROW School for Oncology & Developmental Biology, University of Maastricht, Maastricht, The Netherlands
Niels W. Schurink, Joost J. M. van Griethuysen, Nino Bogveradze, Najim el Khababi & Regina G. H. Beets-Tan
Department of Radiation Oncology, The Netherlands Cancer Institute, Amsterdam, The Netherlands
Simon R. van Kranen
Department of Epidemiology and Biostatistics, The Netherlands Cancer Institute, Amsterdam, The Netherlands
Sander Roberti
Department of Radiology, Acad. F. Todua Medical Center, Research Institute of Clinical Medicine, Tbilisi, Georgia
Nino Bogveradze
Department of Radiology, Maastricht University Medical Centre, Maastricht, The Netherlands
Frans C. H. Bakers
Department of Radiology, Deventer Ziekenhuis, Deventer, The Netherlands
Shira H. de Bie
Department of Interventional Radiology, Elisabeth Tweesteden Hospital, Tilburg, The Netherlands
Gerlof P. T. Bosma
Department of Radiology, Jeroen Bosch Hospital, ’s-Hertogenbosch, The Netherlands
Vincent C. Cappendijk
Department of Radiology, Northwest Clinics, Alkmaar, The Netherlands
Remy W. F. Geenen
Department of Surgery, Alrijne Hospital, Leiderdorp, The Netherlands
Peter A. Neijenhuis
Department of Radiology, Spaarne Gasthuis, Haarlem, The Netherlands
Gerald M. Peterson
Department of Radiology, IJsselland Hospital, Capelle Aan Den IJssel, The Netherlands
Cornelis J. Veeken
Department of Radiology, Zuyderland Medical Center, Heerlen, The Netherlands
Roy F. A. Vliegen

Authors

Niels W. Schurink
View author publications
You can also search for this author in PubMed Google Scholar
Simon R. van Kranen
View author publications
You can also search for this author in PubMed Google Scholar
Sander Roberti
View author publications
You can also search for this author in PubMed Google Scholar
Joost J. M. van Griethuysen
View author publications
You can also search for this author in PubMed Google Scholar
Nino Bogveradze
View author publications
You can also search for this author in PubMed Google Scholar
Francesca Castagnoli
View author publications
You can also search for this author in PubMed Google Scholar
Najim el Khababi
View author publications
You can also search for this author in PubMed Google Scholar
Frans C. H. Bakers
View author publications
You can also search for this author in PubMed Google Scholar
Shira H. de Bie
View author publications
You can also search for this author in PubMed Google Scholar
Gerlof P. T. Bosma
View author publications
You can also search for this author in PubMed Google Scholar
Vincent C. Cappendijk
View author publications
You can also search for this author in PubMed Google Scholar
Remy W. F. Geenen
View author publications
You can also search for this author in PubMed Google Scholar
Peter A. Neijenhuis
View author publications
You can also search for this author in PubMed Google Scholar
Gerald M. Peterson
View author publications
You can also search for this author in PubMed Google Scholar
Cornelis J. Veeken
View author publications
You can also search for this author in PubMed Google Scholar
Roy F. A. Vliegen
View author publications
You can also search for this author in PubMed Google Scholar
Regina G. H. Beets-Tan
View author publications
You can also search for this author in PubMed Google Scholar
Doenja M. J. Lambregts
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Regina G. H. Beets-Tan or Doenja M. J. Lambregts.

Ethics declarations

Guarantor

The scientific guarantor of this publication is Dr. Doenja MJ Lambregts.

Conflict of interest

The authors of this manuscript declare no relationships with any companies whose products or services may be related to the subject matter of the article.

Statistics and biometry

One of the authors, Mr. Sander Roberti, has significant statistical expertise.

Informed consent

Written informed consent was waived by the Institutional Review Board.

Ethical approval

Institutional Review Board approval was obtained.

Study subjects or cohorts overlap

Some (45/649) study subjects or cohorts have been previously reported in Lambregts (2018) Dis Colon Rectum.

Methodology

• observational

• multicenter study

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 1153 KB)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Schurink, N.W., van Kranen, S.R., Roberti, S. et al. Sources of variation in multicenter rectal MRI data and their effect on radiomics feature reproducibility. Eur Radiol 32, 1506–1516 (2022). https://doi.org/10.1007/s00330-021-08251-8

Download citation

Received: 16 June 2021
Revised: 23 July 2021
Accepted: 06 August 2021
Published: 16 October 2021
Issue Date: March 2022
DOI: https://doi.org/10.1007/s00330-021-08251-8

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Sources of variation in multicenter rectal MRI data and their effect on radiomics feature reproducibility

Abstract

Objectives

Methods

Results

Conclusions

Key Points

Similar content being viewed by others

The impact of pre-processing and disease characteristics on reproducibility of T2-weighted MRI radiomics features

Acquisition repeatability of MRI radiomics features in the head and neck: a dual-3D-sequence multi-scan study

Reproducibility of radiomic features in CT images of NSCLC patients: an integrative analysis on the impact of acquisition and reconstruction parameters

Introduction

Materials and methods

Patients

Imaging and image processing steps

Analysis of sources of variation

Center variations (case-mix, hardware, and image acquisition)

Image segmentation

Feature extraction software

Results

Sources of variation

Center variations (patient-mix, hardware, and image acquisition)

Image segmentation

Feature extraction software

Discussion

Change history

10 March 2022

Abbreviations

References

Funding

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Guarantor

Conflict of interest

Statistics and biometry

Informed consent

Ethical approval

Study subjects or cohorts overlap

Methodology

Additional information

Publisher's note

Supplementary Information

Supplementary file1 (DOCX 1153 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation