Radiomics in prostate cancer imaging for a personalized treatment approach - current aspects of methodology and a systematic review on validated studies

Prostate cancer (PCa) is one of the most frequently diagnosed malignancies of men in the world. Due to a variety of treatment options in different risk groups, proper diagnostic and risk stratification is pivotal in treatment of PCa. The development of precise medical imaging procedures simultaneously to improvements in big data analysis has led to the establishment of radiomics - a computer-based method of extracting and analyzing image features quantitatively. This approach bears the potential to assess and improve PCa detection, tissue characterization and clinical outcome prediction. This article gives an overview on the current aspects of methodology and systematically reviews available literature on radiomics in PCa patients, showing its potential for personalized therapy approaches. The qualitative synthesis includes all imaging modalities and focuses on validated studies, putting forward future directions.


Introduction
In global cancer statistics of men prostate cancer (PCa) is the most frequently diagnosed malignancies in the world and the fifth leading cause of death worldwide [1,2]. Therefore, the development of accurate diagnostic tools is of great importance. Many modern imaging modalities provide a great value in screening, diagnosis, treatment response measurement and prognosis evaluation of PCa patients. A suspicious digital rectal examination and/or an elevation of prostate specific antigen (PSA) in blood serum lead to transrectal ultrasound (TRUS) guided biopsy for histopathologic verification of PCa Ivyspring International Publisher [3]. In recent years an augmented approach of this strategy, including magnetic resonance imaging (MRI), has gained traction in clinical application and was incorporated into guidelines [4]. MRI is not only employed prior to biopsy, but for local staging and follow up [5]. Nevertheless, diagnostic accuracy is still hampered by inter-observer variability and exactness of lesion detection does not seem to be warranted [6][7][8]. In an attempt to improve interpretation, reporting and acquisition standards for global harmonization "Prostate Imaging -Reporting and Data System Version 2.1" (PI-RADSv2.1) were established [9]. Bone scans and computer tomography (CT) used to be standard of care (SoC) for staging and re-staging. As of late prostate specific membrane antigen positron emission tomography (PSMA-PET) has been implemented into clinical practice as recommended in current guidelines for staging and restaging [10][11][12][13]. Additionally, growing evidence proclaims the use of PSMA-PET for intraprostatic lesion detection and segmentation [14][15][16][17][18]. In addition to an accurate diagnosis, proper and decisive risk stratification is crucial, due to a variety of treatment options in different clinical scenarios. However, recommended models for risk classification [5,19,20] might not always predict the final outcome in every disease stage of PCa [21,22]. Thus, new concepts for adequate detection and risk stratification towards precision medicine and personalized treatment are required. With the rise of big data analysis, the computer-based extraction of pre-defined image features in terms of "hand-crafted" radiomic features (RF) is an emerging field in research that might satisfy this need. It is hypothesized that medical images contain more information than discernible visually by trained professionals. Simplified these RF might provide more information about a tumor or other tissues facilitating diagnosis, risk stratification and therapeutic outcome. The advantage of radiomics is the utilization of SoC images without additional required effort and the abundance of medical images available, which can be utilized for longitudinal monitoring. Another benefit is that radiomics examines whole tumors as opposed to biopsy schedules which are prone to sampling errors due to intratumoral heterogeneity [23,24]. Thereby, radiomics offer great potential for personalization of therapeutic approaches, in particular for image-based disciplines such as radiation oncology. This review gives an overview of methodological aspects of radiomics firstly, followed by the methodology of our literature search and a qualitative synthesis of radiomics in prostate cancer subdivided by imaging modalities and based on a systematic search. There have been reviews on this topic but mostly focusing on MRI whereas our review includes all imaging modalities and concentrates on papers with internal or external validation [25][26][27][28][29].

Radiomics Pipeline
The Radiomics Pipeline (Figure 1) is the entire sequence of data processing from imaging to a diagnostic, predictive or prognostic model based on RF. It is subdivided into three major operations.
(1) Image acquisition and preprocessing (2) High-throughput feature extraction (3) Data integration and data analysis The Image Biomarker Standardization Initiative (IBSI) published a reference manual to harmonize the feature extraction by providing (i) definitions, (ii) a standardization of the radiomics pipeline, (iii) reference datasets and (iv) a reporting scheme [30].
(1) Image acquisition and preprocessing All imaging modalities mentioned in the introduction can be utilized for PCa radiomics: TRUS, MRI, PSMA-PET, CT and bone scan. It is important to mention that heterogeneities in image acquisition und image reconstruction algorithms due to different local standards are culpable for missing repeatability and reproducibility of RF [31][32][33][34]. Prospective trails with fixed imaging protocols could aim to ensure that a scan yields similar results in the same patient when repeated on the same system i.e. repeatability as well as on different systems and institutes i.e. reproducibility [33]. After image acquisition the volume of interest (VOI) is delineated manually, semi-automatically or fully automatically. If manual segmentation is performed a sophisticated protocol should be used throughout the whole dataset to minimize inter-observer variability [35]. Subsequently and before feature extraction images should be pre-processed, e.g., by intensity inhomogeneity correction or noise filtering for MR-images [32]. It is known that preprocessing sequences can also have a significant impact on the robustness and reproducibility of RF and identification of generalizable and consistent preprocessing algorithms is a pivotal step [36].
(2) High-throughput feature extraction The spatial and gray level information of the segmented voxels is used in numerous mathematical calculations to extract pre-defined "hand-crafted" RF. They can be computed with various open-source packages like PyRadiomics [37], IBEX [38], RaCat [39], QIFE [40], MaZda [41], CERR [42] or LIFEx [43] as well as commercial products [44]. Additionally, radiomic codes implemented in MatLab ® are commonly used. It is very important to validate the used software tools, especially homemade software, with datasets provided by the IBSI [30], to increase reproducibility, robustness and comparability across different platforms. Current versions of PyRadiomics, LIFEx, RaCat, CERR and adapted version of IBEX comply with IBSI. Besides the extraction of hand-crafted RF, convolutional neuronal networks as a subfield of machine learning (ML) can be used for pattern recognition and image feature analyses by applying the actual images [45]. This can be done in combination with predefined "hand-crafted" features [46,47], but mostly ML algorithms engineer models, based on large amounts of data, autonomously [48].

(3) Data integration and data analysis
Often a vast number of RF are computed, and the abundance of RF demands feature selection and/or reduction to avoid overfitting and to exclude not relevant or redundant features. Many features are correlated with each other; these redundant features might be depicted with heatmaps and should be omitted [49,50]. Additionally, ML algorithms like minimum redundancy and maximum relevance or fisher score can be used to assess the correlation between RF [51,52]. Other options for feature reduction are prioritizing robust features [34]. An overview of feature reduction steps including quantitative comparisons of performance is given by Leger et al [53] and Parmar and colleagues [54]. The analysis of the remaining features can be conducted by using the RF alone or in combination with other clinical parameters by applying classical statistical methods or ML for data integration and modelling. Examples of ML classifiers are random forests, support vector machines and nearest neighbors for instance [51,55]. To avoid overfitting, it is recommended to control false positive results by correcting for multiple testing when the data analysis is based on classical statistical methods [56,57]. After the generation of a model based on RF, validation should be executed to evaluate its performance and to assess generalizability [58]. In recent years ever more emphasis has been laid on this last step [33][34][35]. During an internal validation, the data is usually split in 3 datasets: (i) one training dataset for optimizing the parameters of a model, (ii) one validation dataset for hyperparameter optimization e.g., the depth of a tree or a deep-learning architecture and (iii) one test set for the final assessment. The latter might be used independently for validation. During cross-validation (CV) usually small datasets are divided accordingly in an iterative process. K-fold CV partitions the dataset in k subsets using one as a validation and the rest for training. This process is repeated for each subset. Leave-one-out CV operates similarly but leaves one Radiomics pipeline depicts the data processing and operations to build a radiomics model with validation. First an image is acquired and segmented manually, semiautomatically or fully automatically. Then feature extraction is performed after preprocessing. Feature classes are shape features, first order features and texture features. Due to the abundance of RF a selection or reduction should be performed before or while integrating with histology, genomics or clinical data. Data analysis can be performed by using classical statistical models, with machine learning or deep learning. A predictive, prognostic or diagnostic model is built and should be internally or externally validated. Abbreviations: GLCM=gray level co-occurrence matrix; GLDZM = gray level distance zone matrix; GLRLM = gray level run length matrix; GLSZM = gray level size zone matrix; NGTDM = neighboring gray tone difference matrix; NGLDM = neighboring gray level dependence matrix; RF= Radiomic features. patient for validation while using the rest for training CV should be used with caution especially with leave-one-out CV tending to be overly optimistic [59,60]. External validation based on independent datasets from different institutes enables the highest quality of validation [58].

Hand-crafted Radiomic Features
"Hand-crafted" RF [48] can be grouped in shape descriptors, 1 st order features and texture features. Shape features describe the morphology of the VOI for instance the size, volume or diameter. 1 st Order features are based on an intensity histogram derived of the segmented voxels [24]. Texture features are more advanced and do not only rely on voxel intensities i.e., gray levels but on spatial information as well. First introduced by Haralick et al. the gray level co-occurrence matrix (GLCM) assesses the gray levels of pairs of neighboring voxels [61]. Others like the gray level size zone matrix (GLSZM) [62] and the gray level run length matrix (GLRLM) [63] analyze groups of consecutive voxels, zones, or runs of connected voxels in one direction, respectively. For a more complete description of features, texture matrices and their mathematical calculations we recommend the IBSI and their documentation [30]. High-order features are calculated on filter transformed images like wavelets or gaussian bandpass filer [24].

Methodology
Studies eligible for inclusion complied with the following criteria: articles had to be on PCa radiomics with predefined "hand-crafted" features derived from MRI, TRUS, CT, Choline-or PSMA-PET and needed to apply internal or external validation. Excluded were papers not written in English and non-original articles. Two of the co-authors (SKBS and ASB) performed independently a PubMed/Medline, EMBASE and Cochrane Library database search for the terms: (cancer of prostate[MeSH Terms]) AND ((texture features) OR (radiomics)). If the two independent readers included or excluded studies differently a third reader (CZ) decided on eligibility. This was performed in 11 cases. The time period considered in this literature review was from 1 st of January 2014 [64] to 1 st of January 2021. 251 articles were located. Additionally, 22 manuscripts were identified through other sources (e.g., google scholar or references in screened manuscripts). 35 duplicates were removed. Only articles that met inclusion criteria were included. Finally, 77 studies were included in the qualitative synthesis. Please see Figure  2 for a detailed description of the performed literature search according to PRISMA [65]. Due to heterogeneity of imaging modalities, in the applied Radiomics pipeline and the analyzed endpoints no quantitative analysis was performed. Additionally, we assessed whether the utilized software complied with IBSI.
Furthermore, ongoing clinical trials were screened on "clinicaltrials.gov". Studies eligible for inclusion fulfilled the following criteria: ongoing trials on PCa RFs with "hand-crafted" features derived of MRI, TRUS, CT or PSMA-PET. Trials with unknown status were excluded. CZ performed the search for the terms ("Condition or disease: prostate cancer" AND "radiomicsl" OR "texture features"). Six trials were located, and one trial (NCT03294122) was excluded due to unknown trial status.

Prostate cancer detection
Five studies investigated RF for the prediction of extracapsular extension and reported high AUC values between 0.80-0.90 for radiomic signatures based on T2w and ADC sequences [90,103,120] that outperformed clinical or nomogram models [104,105]. Two studies from Wang et al. and Zhang et al. showed that mpMRI derived RFs show good performance for bone metastasis prediction in untreated PCa with an ROC-AUC up to 0.92 [85,107]. Six studies analyzed the performance of RF in terms of outcome [108][109][110][111]117,122]. Bourbonne et al. externally validated an ADC based RF (SZEGLSZM), which was identified in a previous study [94] for biochemical recurrence (BCR) prediction after surgery with an accuracy of 0.76 [108,109]. Shiradkar et al. demonstrated a ML classifier derived from T2w and ADC RF with good prediction of BCR after surgery or RT. which was externally validated with a AUC of 0.73 [110]. Another RF model by Zhong et al. showed good performance for BCR prediction after RT of localized PCa [117]. Abdollahi and colleagues indicated that RF from pre-and post-treatment ADC images are predictive in terms of treatment response after primary external beam radiotherapy [111].
Another study from this group demonstrated that RF of pre-radiotherapy images provided good ROC-AUC values of up to 0.81 for rectal toxicity prediction [122]. One study by Sunoqrot et al. elaborated a quality system to asses automated prostate segmentations with external validation [118] and two studies from Lay et al. and Giannini et al. addressed RF-based PCa segmentation [112,113].
Three studies aimed for GS discrimination [124][125][126] and demonstrated excellent ROC-AUC values between 0.81-0.91. Two studies chose intraprostatic tumor detection as study endpoint [123,126]. A study by Zambolgou et al. reported two distinct RFs (SAE, local binary pattern small-area emphasis; SZNUN, local binary pattern size-zone non-uniformity normalized) with good performance to detect significant PCa lesions not visible in PSMA-PET/CT. This result was externally validated by an independent cohort [123]. Cysouw et al. demonstrated a RF based machine learning model to predict lymph node involvement, presence of metastases, GS prediction (≥8) and presence of extracapsular extension [125].
GS discrimination by RF was the aim of four studies using TRUS [134], CT [128,133] or CBCT images [129] and reported excellent ROC-AUC values between 0.77-0.98 including one or multiple RF for modeling. Three studies defined intraprostatic tumor detection in TRUS images [47,134,136] as a study endpoint. Again, the implementations of one or multiple RF led to very promising results in PCa detection. The study of Wu et al. implemented RF for automatic prostate gland delineation in TRUS images [135] and observed similar results compared to manual delineation by experts. One study [130] implemented RF to predict bladder and bowel toxicity after radiotherapy of PCa patients and reported ROC-AUCs of up to 0.77 by integrating clinical information with RF. The study of Osman et al.
suggested that RF derived from CT images might enhance interpretation of treatment response of bone metastases [133] and Acar et al. demonstrated that RFs derived from CT images of PSMA-PET/CT scans could accurately distinguish between metastatic lesions and sclerotic area [132]. The RF model in a study by Peeken et al. outperformed conventional measures for detection of lymph nodes metastases [131].

IV Ongoing trials
In total, 5 studies were identified using mpMR imaging (n = 4), PET (n = 1), CT (n = 1) and bone scans (n = 1) to extract RF (see Table 4). Four studies evaluate RF for outcome prediction during or after several treatment approaches: active surveillance, surgery, radiotherapy, or radionuclide therapy in addition to chemotherapy. Two of those four studies integrate RF with molecular markers for modelling. One study evaluates whether RF extracted from lesions describe histologic characteristics, lymph node involvement and extension. Table 2. List of included articles on RFs derived from PSMA-PET images. In the second column # are the number of patients enrolled retrospectively (R) or prospectively (P). In the fourth column the volume of interest (VOI) is presented accompanied by the type of segmentation in brackets M = manual, SA = semiautomatic and A = fully automatic. The last column contains information on validation. The number stands for the number of cohorts used. 2 means one for development and one for testing. Abbreviations: CV= cross-validation; PCa = prostate cancer; GS = Gleason score, SAE, local binary pattern small-area emphasis; SZNUN, local binary pattern size-zone non-uniformity QSZHGE= quantization algorithm + short zones high gray-level emphasis. Table 3. List of included articles on RFs derived from other imaging modalities than MRI. In the second column # are the number of patients enrolled retrospectively (R) or prospectively (P). In the fourth column the volume of interest (VOI) is presented accompanied by the type of segmentation in brackets M = manual, SA = semiautomatic and A = fully automatic. The last column contains information on validation. "e" stands for external validation and "I" for internal. The number stands for the number of cohorts used. 2 means one cohort for development and one for testing.

Discussion
PCa radiomics is an emerging research field with a high potential to offer non-invasive and longitudinal biomarkers for personalized medicine. In our review based on a qualitative synthesis of 77 studies, most papers address MRI based RFs, which is not surprising since MRI is the actual SoC for primary PCa staging. Other imaging modalities such as CT, PSMA-PET, TRUS and bone scan are less commonly used, but their application has improved in the recent years. This trend might proceed with the increased usage of PSMA PET/CT for staging of primary, recurrent, and metastasized PCa patients. One major focus of the included papers was PCa detection. Keeping in mind that image interpretation and segmentation is hampered by interobserver variability [6,137] implementation of RF might enhance diagnostic performance. Advances in automated segmentation of intraprostatic tumor lesions, for example by deep learning-based approaches such as convolutional neural networks, might overcome this limitation [138].
The other focus is GS discrimination, reflecting the need for improvements in risk stratification. It is not surprising that most of the studies chose GS discrimination, since GS is the most established histologic biomarker. In clinical routine, the GS before primary PCa therapy is evaluated in tissue cores obtained by biopsy. However, due to intratumoral heterogeneity the GS in biopsy cores and prostatectomy specimen is discordant in 20-60% of the patients [139,140]. Nevertheless, the bioptic GS has a significant impact on clinical management as it defines the patient's risk group influencing for example the duration of androgen deprivation therapy or the dose to the prostate during radiation therapy [141]. RF-based GS prediction might account for intratumoral heterogeneity leading to over-or underestimation of the GS in biopsy specimen. For instance, Zamboglou et al. demonstrated that a PSMA PET-derived RF (QSZHGE) may outperform biopsy mapping for GS 7 vs ≥8 discrimination [126]. Recently, Chu et al. examined the PSMA expression in a combined cohort of more than 18 000 radical prostatectomy specimens and observed a correlation between PSMA expression and the GS [142]. This finding provides a strong biological rationale for non-invasive GS prediction based on RF extracted from PSMA PET images.
However, several studies proposed that a thorough analysis of PCa tissue characteristics (e.g. by genomic analyses) might outperform GS for risk prediction [143]. Radiogenomics combines RF analysis with genomic information thus linking both research fields. Our literature search revealed five studies but none of the studies were internally or externally validated and thus excluded. Nevertheless, they should be mentioned, highlighting this modern and innovative approach [25,[144][145][146][147]. A pilot study by Sun et al. showed weak correlations between RF and hypoxia gene expressions, providing an opportunity to assess the hypoxia status in PCa [146]. Two studies by McCann et al. and Switlyk et al. demonstrated an association between RF and the genetic marker phosphatase and tensin homolog [144,147]. Stoyanova et al. identified radiomic signatures which reflected genes that are over-and underexpressed in aggressive prostate cancer [25]. Additionally, another study with a small patient cohort by Kesch et al. suggests that RF signatures could distinguish between lesions of different aggressiveness [145].
Direct prediction of treatment outcome with RF is investigated in ongoing clinical trials especially. A possible explanation for this finding is the long follow-up time needed to provide reliable clinical information of treatment outcomes in PCa patients. Just a few manuscripts (n = 24) address extraprostatic extension (n = 6), BCR (n = 6), segmentation (n = 4), bone metastasis (n = 3), lymph node detection (n = 3) and radiotherapy toxicity (n = 2). Considering that most PCa patients are long-term survivors after treatment a reliable prediction of toxicity is warranted. Due to the lack of predictive models for toxicity prediction, we consider this field of major interest for future studies. Some of the excluded studies featured interesting concepts for the use of radiomics and treatment associated toxicity in PCa patients. Radiotherapy toxicity prediction was investigated for femoral head fractures [148] and urethral strictures after high-dose rate brachytherapy [149]. One paper used RF for response assessment of PCa bone lesions derived of and ADC maps [150]. Rossi et al. did not compute RF on imaging but on rectum and bladder 3D dose-volume histogram distributions. This add-on improved the prediction of late toxicities after radiotherapy [151]. These extensive fields of application demonstrate the great potential of radiomics and its clinical implementation from diagnosis to outcome and toxicity prediction in an era of big data and individualized medicine.
Overall, most of the included studies presented good to high AUC values. However, these findings need to be considered diligently regarding publication bias and the variability observed in RF. As illustrated above the radiomic pipeline is a sequence of operations and each operation can be modified [31]. RF and models are sensitive to those modifications and consequently, investigations on RF variability, robustness and reproducibility are demanded [31].
Texture features are increasingly sensitive to acquisition parameters with growing spatial resolution [152] as well as reconstruction algorithms [153]. Yang et al proposed a simulation framework to asses robustness and accuracy of radiomic textural features with different MRI acquisition parameters and reconstruction algorithms [153]. Recently Rai et al. developed a 3D printable phantom to measure repeatability and reproducibility of MRI-based radiomic features which could facilitate multi-center studies to harmonize image protocols and thereby tackling some of these challenges [160].
Multiple segmentations can reduce variability and bias in RF extraction of manually, semiautomatically or automatically segmented VOIs [154]. To increase robustness of segmentation manual methods should be avoided. In PET images, Bashir et al demonstrated that semiautomatic threshold-based methods yield superior interobserver reproducibility [155]. Additionally, CNN based segmentation methods showed good performance [156].
Isaakson et al investigated normalization techniques to enhance comparability across different subjects and visits [158]. Scalco et al. investigated different generally adopted image intensity normalization techniques for T2w-MRI images and demonstrated a relevant impact on reproducibility of RFs [154].
Schwier et al. investigated the variability of RF in MRI by using different filters, normalization, and image discretization techniques and observed that RF were sensitive to these pre-processing procedures. Hence, they recommended detailed reporting of the pre-processing steps and the use of open-source software [29]. Orlhac et al. reported that ComBat harmonization is efficient and enables MRI data pooling from different scanners and centers [155].
Two studies investigated repeatability of MRI-derived RFs and concluded that repeatability of many RFs is moderate and that a set of reproducible image features is desirable [156,157]. Delgadillo et al. investigated repeatability of RF derived from CBCTs and reported that only five radiomic features were repeatable in < 97% of the reconstruction and preprocessing methods [159]. Bologna et al. proposed an approach to assess RF stability without multiple acquisitions and segmentations that could be used for preliminary RF selection. In addition, the authors advocated that RF derived of ADC maps behave differently based on the region extracted e.g. RF derived from head and neck tumors are less stable than those derived of sarcomas [161]. Pfaehler et al. recommends to investigate the repeatability of RF for every tumor type as well and for every PET-Tracer [30].
These papers demonstrate the fragility of RFs and the need of reproducible RF sets in order to enable a broad clinical application.
Consequentially, more research on prostate MRI and PSMA-PET RF robustness should be performed. Other approaches to tackle RF variability is the standardization of RF definitions and calculations which IBSI tries to promote [28]. The radiomics quality score, a tool to evaluate methodologic quality of radiomic studies, could also be used [32]. With higher quality, evidence on RF robustness like the recent metanalysis of Zwanenburg et al. pitfalls could be uncovered and described [33]. These methodological aspects seem all the more important, since only a few studies identified in this review are explicitly IBSI compliant and future work needs to focus on this issue. We furthermore encountered problems to validate the studies IBSI compliance, since most studies don't give sufficient information about the used software and calculations of RF. We therefore plead for uniform and detailed specifications.
Nevertheless, validation is pivotal considering the variability of RF. 35 of 238 articles were excluded due to missing validation. In internal validation different types might be utilized like the aforementioned ML algorithms, k-fold CV or leave-one-out CV, as well as independent datasets for model development and validation. A proper methodology and the separation of training and validation dataset is demanded at all times [157]. Our synthesis detected 64 articles with internal validation (k-fold CV n = 36; leave-one-out CV n = 11, two cohorts n = 29). 14 studies used more than one validation type. External validation is the gold standard and was performed in eight of the identified articles. Only one manuscript reported about external validation of an already published model [108]. These findings put ever more emphasis on the validation of radiomics models especially externally and from already published models [58].
Many studies used ML for model building and verification. ML and deep learning as a subfield are emerging and harbor great potential [48]. Li et al. used deep learning in combination with "hand-crafted" features and has successfully applied it in differentiating unilateral breast cancer from low-risk patients [46]. Segmentation of PCa lesions by deep learning networks is explored without "hand-crafted" features [158]. This review focusses on the clinical aspects of RF demonstrating its great potential to affect management of PCa. However, some technical aspects have not been further investigated: information on the used algorithms for RF extraction or ML approaches were not provided. Additionally, we did not state whether the published models or the parameters are publicly available.
In conclusion, most research in PCa radiomics focuses on PCa detection and GS discrimination. MRI as SoC is the most used imaging modality for RF computation for now, but PSMA-PET is gaining evidence in a wide variety of clinical settings. Most of the results suggest good to high performance of radiomics models but should be considered carefully due to RF variability. Further research is demanded on RF sensitivity and robustness especially on RF extracted of prostate MRI and PSMA-PET.