Introduction

Prostate cancer is the second most frequent cancer diagnosed in men, and despite many improvements in detection and treatment, it is still one of the leading causes of cancer-related mortality1. Histology from needle biopsy is crucial for risk stratification and proper selection of treatment options tailored to the characteristics of each tumor and patient. Historically, samples for histopathology have been obtained by standard transrectal ultrasound (TRUS)-guided biopsy using a systematic scheme. However, TRUS-guided prostate biopsy has long been known to result in undersampling, and the other diagnostic uncertainty of this technique is discordance between needle biopsy and RP histologic grading2,3, which might lead to under or overtreatment.

In recent years, mp-MRI and subsequent MRI-targeted biopsy techniques have proven to be a highly accurate pathway for detecting clinically significant PCa while simultaneously decreasing the detection of clinically insignificant cancers4. The three common MRI-targeted biopsy techniques include visual registration of MRI images with real-time ultrasound images, software-assisted fusion of MRI images and real-time ultrasound images, and MRI-guided in-bore biopsy. MRI-guided in-bore biopsy technique by using ADC maps for fine manual adjustments during the procedure and real-time feedback for needle placement can help to obtain an adequate fraction of the tumor to reach higher concordance between biopsy and final pathology5,6. Although there is increased concordance between MRI-targeted biopsy and final pathology from RP, upgrades of the Gleason grade group (GG) are still important, especially for patients with GG1, who are candidates for active surveillance. Active surveillance is a management strategy for low-risk PCa patients designed to avoid overtreatment and the potential side effects of surgery7. High Prostate Imaging Reporting and Data System (PI-RADS) scores and/or large tumor sizes on mp-MRI were reported to be predictive factors of upgrading GG1 lesions8.

In this retrospective study, an analysis of the clinical variables of patients who underwent MRI-targeted in-bore biopsy and subsequent RP was conducted, with two primary objectives: (i) to identify the clinical variables that were pertinent to the upgrade of GG in the final pathology and (ii) to investigate the possibility of predicting GG upgrade through the utilization of machine learning methods on an individual patient basis, thereby providing a foundation for personalized treatment planning.

Methods

Patients

Following the internal review board approval of Koc University, a retrospective examination was conducted on the datasets of 400 men who underwent mp-MRI and subsequent MRI-targeted in-bore biopsy at American Hospital (Istanbul, Turkiye) between 2012 and 2022, with the high likelihood target, i.e. PI-RADS 4 and 5. The research was carried out adhering to the principles outlined in the Declaration of Helsinki. Among the patients who were diagnosed with PCa, 95 of them (median age 64, range 42-78) underwent RP as a definitive treatment. Of these, 20 patients were diagnosed with GG1 PCa based on the in-bore biopsy results. Although GG1 patients normally do not require active treatment and are actively surveilled, the shared decision for definitive treatment took into account the following criteria: History of prostate cancer in the father or brother, International Prostate Symptom Score(IPSS)\(>19\), tumor positivity of 2 cores or more, and PI-RADS 4 or PI-RADS 5 lesions bigger than 10 mm. The time interval between in-bore biopsy and RP was less than 6 months for most of the patients. None of the patients had received either radiotherapy or hormone therapy before RP. All biopsy cores and radical prostatectomy specimens were evaluated by a dedicated uropathologist with 16 years of experience, according to the recommendation of the International Society of Urological Pathology (ISUP).

Our study focused on per index lesion level. All high-likelihood lesions (PI-RADS 4 and 5) detected on mp-MRI were targeted by in-bore biopsy. Index lesions were depicted according to PI-RADS version 2 guideline9. All index lesions detected on mp-MRI and sampled by in-bore biopsy were confirmed at whole-mount step-section specimens after RP. Since non-index lesions were not clinically related to patient outcomes, they were not analyzed10,11,12,13.

Multiparametric MRI and measurements

All multiparametric-MRI examinations were conducted on a 3.0 Tesla MRI Scanner (Magnetom Skyra, Siemens AG, Germany) with sixteen-channel body coil. Butlyscopolamine were used to suppress bowel peristalsis during the examination. The MRI protocol included T2-weighted imaging in axial, coronal and sagittal planes, diffusion-weighted imaging (DWI), and dynamic contrast-enhanced pulse sequences (Table 1).

Table 1 Multiparametric MRI protocol.

Tumor size, tumor location and PI-RADS scores were interpreted in consensus by three radiologists. ADC values were measured by two radiologists who were blinded to clinical variables and pathology results. The \(\hbox {ADC}_{\text{mean}}\) values were obtained by drawing a regions of interest (ROIs) that cover the largest tumor area excluding the tumor edges. While the \(\hbox {ADC}_{\text{min}}\) values were obtained by drawing an ROI on the area that visually depicts the lowest ADC value within the tumor (Fig. 1). The interobserver agreement regarding the ADC measurements was assured with 79.3% and 67.7% correlations for mean and min values, respectively. The pathologic interpretation was the same as our previous publication14.

In-bore biopsy technique

In-bore biopsy was performed in an outpatient setting on the same 3 T MRI scanner. All biopsy procedures were carried out by a single radiologist [M.V.] who had more than 15 years of experience in urogenital radiology and interventions.

During the biopsy, the patients were positioned in the prone position. The needle guide, lubricated with 2% lidocaine gel, was inserted into the rectum and attached to a commercially available biopsy device (DynaTRIM, Invivo). To adjust the needle guide placement, sagittal T2W turbo spin-echo images were acquired and transferred to a workstation (DynaCAD, Invivo) in the first place. Subsequently, the software then calculated the target’s rough coordinates relative to the needle guide’s tip, which was manually adjusted toward the target. ADC maps were also utilized during the manual needle adjustments to guide the needle to the area with the lowest ADC values (Fig. 1).

Following the initial adjustments, repeat sagittal and multiplanar reconstructed axial and coronal T2-weighted images were obtained for further fine manual adjustments until the needle guide was accurately pointed to the designated target (Fig. 2). Biopsy cores were obtained using an MRI-compatible, 18-gauge biopsy gun with needle lengths of 150 or 175 mm (In vivo, Gainesville, FL). To ensure accurate sampling of the targeted lesion, the fired needle was left deployed in the prostate, and sagittal and reconstructed T2-weighted images were acquired. Only the suspicious target detected on pre-biopsy mpMRI was sampled without performing a complementary systematic biopsy. During the course of our study, we increased the number of biopsy cores in relation to the growing evidence that focal saturation can improve the compatibility of needle biopsy with whole-mount specimen pathology. On a case-by-case basis, the number of biopsy cores was also affected by the patient’s comorbidities, the history of previous negative biopsy, the size and location of the target, and feedback from needle-in images. The number of biopsy cores that were obtained per each lesion ranged from 2 to 5.

Figure 1
figure 1

ADC images demonstrate a PI-RADS 5 lesion in the left peripheral zone, which was subsequently confirmed as prostate cancer (Gleason grade 3). The mean ADC and minimum ADC values were measured as shown in Figure (A) and (B), respectively.

Figure 2
figure 2

A 57-year-old patient presented with an elevated level of prostate-specific antigen (PSA) measuring 8.9 ng/ml, accompanied by suspicious findings on digital rectal examination. Multiparametric magnetic resonance imaging (mp-MRI) identified a PI-RADS 5 lesion in the left peripheral zone. Subsequently, an MRI-guided in-bore biopsy was performed and the diagnosis of prostate cancer (Gleason Group 4) was established. Sagittal (A) and axial (B) T2-weighted images, axial ADC image (D) showing biopsy needle positioning. Axial ADC image (C) taken during in-bore biopsy.

Clinical parameters

Pre-biopsy clinical variables include patient age, prostate volume, prostate specific antigen (PSA), PSA density (PSAD), tumor size, tumor location (either in peripheral zone (PZ) or transition zone (TZ)), assigned PI-RADS score, mean and minimum ADC values acquired by diffusion weighted images. Biopsy records include number of biopsy cores, number of positive biopsy cores, the ratio of positive cores to total number of cores, total biopsy core length (CL), total biopsy tumor length (TL), TL/CL ratio, and biopsy-assigned GG. Table 2 shows the characteristics of the patients that are involved in this study.

Table 2 Patient characteristics comparison for GG upgraded vs non-upgraded cohorts.

Statistical and machine learning analysis

In order to identify clinical parameters that are predictive for GG upgrade, univariate statistics and multivariate machine learning (ML) analyses were performed. For the univariate statistical tests logistic regression was employed. Odds ratio (OR) with confidence interval (CI) that excludes 1 and \(\hbox {p}<0.05\) are considered significant.

The baseline prediction accuracy was calculated by comparison of in-bore biopsy and radical prostatectomy Gleason grades, which was used as the benchmark to evaluate the performance of ML models. ML studies were conducted by selecting algorithms that are robust to overfitting for relatively small datasets such as ours, namely, support vector machine (SVM) with linear and radial basis function (RBF) kernels, least absolute shrinkage and selection operator (LASSO) regression, and ridge regression. To assess the performances of the ML algorithms, we used sensitivity, specificity, the area under the receiver operator characteristic (ROC) curve (AUC)15, and the Youden index16 metrics. Our analyses employed 3 different grouping strategies for the patient cohort: (i) we included all patients and studied all patients with a GG upgrade, (ii) we included \(\hbox {GG}>1\) patients and studied all patients with a GG upgrade, and (iii) we included all patients and studied only those with clinically significant upgraded cases, from GG1 to \(\hbox {GG}>1\).

The evaluation of performance metrics was conducted through a rigorous process involving 100 randomly selected train-test splits across the dataset, ensuring a comprehensive examination of the model’s robustness and consistency. We adhered to a train-test split ratio of 80% for training data and 20% for testing data. Furthermore, to assess the model’s generalizability and mitigate the risk of overfitting, we employed a 3-fold cross-validation strategy.

Informed consent

This retrospective observational study was approved by our Institutional Review Board and the requirement for informed written consent was waived by the Koc University School of Medicine ethics committee. All experiments including the study protocol study followed approved institutional guidelines.

Results

In our study cohort, concordance between biopsy and final pathology GG was recorded in 61 (64.2%) patients. Overall upgrading was recorded in 27 (28.4%) patients, whereas 7 (7.4%) patients were downgraded. Six downgrading men were lowered to the preceding GG, whereas a single case was downgraded by 2 Gleason grade groups (from GG4 to GG2). Among 27 upgraded men, 21 (77.8%) patients’ Gleason grade group were increased by 1 grade. Upgrades by 2 (n=3) and 3 (n=3) grades were also observed equally in 6 cases in total. Among 75 men with biopsy \(\hbox {GG}>1\), 10 (13.3%) upgraded cases were observed whereas 58 (77.3%) cases were concordant. Table 3 shows GG distribution obtained by in-bore biopsy versus RP where diagonal elements represent concordance. The upper and lower diagonal elements represent the cases with GG upgrades and downgrades, respectively. All of the upgraded cases from clinically insignificant to clinically significant PCa (17.9% in our study cohort) consisted of upgrades from GS 3+3 to 3+4, whereas downgrading to clinically insignificant PCa did not occur. We focused on the statistics of GG upgrade only due to the lack of downgraded cases. Table 2 gives a comparative account of clinical variable characteristics in men whose GG upgraded after RP in comparison to the men whose GG did not upgrade.

Table 3 Confusion table of Gleason grades by MRI-guided in-bore biopsy versus RP pathology. Diagonal elements indicate GG concordance. Upper- and lower-diagonal elements indicate GG upgrade and downgrade cases, respectively.

Statistical analysis

Univariate analyses were conducted using logistic regression, and the results are shown in Table 4. Biopsy GG1 stands out as the most significant predictive factor for a GG upgrade at RP (95% CI 0.06–0.32, \(\hbox {p}<0.0001\)), such that 17 of the 20 patients with GG1 were upgraded. A smaller number of biopsy cores (95% CI 0.3–0.76, \(\hbox {p}=0.002\)) and fewer positive biopsy cores (95% CI 0.35–0.87, \(\hbox {p}=0.011\)) were found to be independent predictive clinical factors by univariate analysis.

The fact that the majority of upgraded patients (17 out of 27) in our study had biopsy GG1 poses the risk that this bulk would saturate our statistics and prevent us from identifying other important upgrade risk factors. Therefore, we repeated the statistical analysis for biopsy \(\hbox {GG}>1\) patients only, where increasing tumor size (95% CI 1.07–3.54, \(\hbox {p}=0.028\)) and decreasing number of biopsy cores (95% CI 0.36–0.97, \(\hbox {p}=0.038\)) were the statistically significant predictive factors. Furthermore, we studied the clinically significant upgraded cases (from GG1 to \(\hbox {GG}>1\)), yet none of the clinical parameters turned out to be significant indicators.

The most significant cutoff thresholds for the statistically significant parameters were found by binarizing the parameters using various thresholds and minimizing the p-value. The results indicate that the number of biopsy cores and positive biopsy cores should at least be equal to or larger than 3 and 2, respectively, to decrease the likelihood of GG upgrade. In addition, for the \(\hbox {GG}>1\) subgroup, a tumor size equal to 20 mm stands out as the best diagnostic criterion.

Table 4 Results of the univariate statistical analysis, given for three different patient groupings: (i) entire cohort, (ii) biopsy \(\hbox {GG}>1\) subgroup patients, (iii) the entire cohort where clinically significant upgraded cases (from GG1 to \(\hbox {GG}>1\)) considered only.

Machine learning

The baseline prediction acuracy set by the in-bore biopsy GG was 64.2% for all patients and 77.3% for the patients with biopsy \(\hbox {GG}>1\). We aimed to improve this model by introducing clinical variables. To select the optimum clinical features that maximize the performance of the ML models, we first scaled all clinical variables to the [0, 1] range and then ordered all clinical variables according to their chi-square statistics to GG upgrade. First, machine learning models were trained using only the most correlated feature. Then, at each step, we added the next feature in order and observed its effect on the model performance, measured by the Youden index. At a certain point, the ML models reached a maximum Youden index, and we kept the feature set at that point as our predictive variables. Figure 3a shows the performance of SVM with linear and RBF kernels and LASSO and ridge regressions as a function of the clinical feature set, including the overall patient cohort, after 100 random train-test split iterations. The most favorable results were obtained using an SVM with an RBF kernel (Youden index: \(0.575\pm 0.013\), accuracy: \(0.856\pm 0.004\), sensitivity: \(0.621\pm 0.013\), specificity: \(0.953\pm 0.003\), and AUC: \(0.865\pm 0.007\)) with two predictive clinical features: total number of cores and in-bore biopsy GG.

The same procedure was repeated for biopsy \(\hbox {GG}>1\) patients (see Fig. 3b), where ridge regression yielded optimum results (Youden index: \(0.590\pm 0.024\), accuracy: \(0.904\pm 0.005\), sensitivity: \(0.652\pm 0.023\), specificity: \(0.938\pm 0.004\), and AUC: \(0.944\pm 0.005\)) with 10 predictive clinical features, namely, the total number of cores, PI-RADS score, tumor size, \(\hbox {ADC}_{\text{min}}\), PSAD, in-bore biopsy GG, number of positive cores, prostate volume, core length, and PSA. The steepest improvement in ML model performances was caused by \(\hbox {ADC}_{\text{min}}\) to feature set for \(\hbox {GG}>1\) patients. The number of biopsy cores and tumor size also significantly improved the model performance for the entire cohort and biopsy \(\hbox {GG}>1\) patients, respectively. Table 5 shows the overall results of the feature selection study with the means and standard errors of the model performance metrics.

Table 5 Metrics of the best performing ML models in feature selection.

The performance of the machine learning models was also evaluated by receiver operating characteristic (ROC) curves, where the area under the curve (AUC) was used for performance assessment. Figure 4a shows the ROC curves for the four classifier models used. The mean AUC was obtained using 100 random train-test splits on the overall patient group. The RBF SVM model outperformed by achieving AUC: \(0.865\pm 0.007\). Figure 4b shows the ROC curves computed from biopsy \(\hbox {GG}>1\) patients only. Compared to those of the previous case, the model performances were enhanced. Ridge regression and linear SVM were favored, with an AUC of \(0.944\pm 0.004\).

The use of ML algorithms significantly increased the predictability of GGs at RPs. The final pathological GG estimation accuracy of the ML models reached \(0.856\pm 0.004\) and \(0.904\pm 0.005\) for the entire cohort and biopsy \(\hbox {GG}>1\) patient groups, respectively. Compared to the baseline accuracy established by in-bore biopsy alone, these values indicate 21.4% and 13.1% accuracy enhancement for the two cohorts.

Figure 3
figure 3

Feature selection by four machine learning models using Youden index as performance metric. Results are shown for (a) overall patient cohort and (b) only biopsy \(\hbox {GG}>1\) group. Error bars indicate standard error of the mean Youden index.

Figure 4
figure 4

ROC curves for the machine learning models used. Model performances, assessed by AUC metric, using (a) overall patient cohort and using (b) only biopsy \(\hbox {GG}>1\) cases are compared. Shaded regions denote standard error of the mean ROC curves obtained by 100 iterations.

Discussion

Adverse pathology after RP can have serious management consequences, and men with clinically significant disease may be undertreated. Conversely, an overestimated GG would result in overtreatment and hence a reduction in the quality of life of the patient. Therefore, it is of utmost importance to determine the relevant clinical variables that affect GS concordance.

Gleason grade concordance in the literature ranges from 38 to 63%17,18,19,20. Upgraded cases occur at a rate of 25% to 56%17,18,19,20, significantly outweighing downgrading cases in the majority of the studies, which range from 8% to 16%18,19,21. Although the upgrades in our GG1 group (17 to 20) are remarkable, Costa et al. reported a 66.7% upgrade in the GG1 group22. Liu et al. also reported significant upgrade potential in the GG1 subgroup18. In our study, the GG upgrade rate was comparable to that in recent studies executed with MRI-targeted biopsy techniques and significantly lower than that in studies with TRUS-guided systematic biopsy23,24,25.

A discordant GG between needle biopsy and final pathology is associated with interobserver variability among different pathologists, borderline grades, and more significantly sampling errors26. In support of these arguments, Maruyama et al. reported that GG concordance improved by 6.2% after second-opinion pathology27.

In the literature, multiple clinical variables are reported to be predictive for GG discordance, where GG upgrade indicators include older age20, higher PSA20, lower prostate volume28, higher PSAD28, higher PI-RADS score27, and higher tumor percentage in biopsy cores29.

In our study, univariate analysis revealed that age, prostate volume, PSA, PSAD, PI-RADS score, total biopsy core length, total biopsy tumor length, and tumor percentage in biopsy cores were nonsignificant variables for GG upgrade, whereas the number of biopsy cores, number of positive biopsy cores, Gleason grade, and tumor size were found to be significant predictors of GG upgrade. Although the \(\hbox {ADC}_{\text{mean}}\) and the \(\hbox {ADC}_{\text{min}}\) values were found to be irrelevant variables for GG upgrades in univariate analysis, in the \(\hbox {GG}>1\) patient group, multivariate machine learning analysis found the \(\hbox {ADC}_{\text{min}}\) value as a useful variable for predicting GG upgrades.

Although the diagnosis of PCa is shifting to targeted biopsy, no agreement has been reached on optimum number of cores. Recent studies showed that more than two biopsy cores had no incremental value in determining the GG30,31, however there are some contrary publications suggesting that additional cores from sextants adjacent to designated target (so called focal saturation) can increase biopsy yield and the concordance between needle and final pathology by excluding the effect of GS heterogeneity32. According to Tracy et al. the likelihood of GG upgrade decreases with an increase in the number of targeted cores33. Our study results also revealed inverse correlation between number of total and positive cores and GG upgrade likelihood at final pathology. Compared to our study with corresponding 28.4% and 17.9% rates, and \(3.4\pm 0.8\) cores taken, Ahdoot et al. reported 30.9% and 8.7% rates and Costa et al. reported 13% and 4.4% rates for any GG upgrading and clinically significant upgrading at final pathology with \(5.8\pm 2.7\) and average 3.2 MRI-targeted cores, respectively22,23.

Intratumoral heterogeneity of tumors is a well known concept and increase in fraction of heterogeneous genetic fusion parallel to tumor size is reported in prostate cancer34. Langer et al. showed that peripheral zone prostate cancer is heterogeneous in nature and 36% percent of tumors consists of scattered few malignant glands intermixed with healthy tissue and classified as sparse tumors35. To our knowledge, the effect of tumor size on GG upgrade was not studied in literature. Our study showed that tumors with larger sizes were upgraded more than tumors with smaller size, which is statistically significant for \(\hbox {GG}>1\) subgroup (\(\hbox {p}=0.028\)). Our analysis revealed 20 mm as a threshold for \(\hbox {GG}>1\) group, and showed that tumors over 20 mm have a higher possibility to upgrade after biopsy. Due to size criteria of PI-RADS 2.136, our threshold with 20 mm falls into PI-RADS 5 category. The correlation between PI-RADS scores and Gleason grades is well known37,38,39, besides that in accordance with our results, the correlation with upgrades of GG and PI-RADS score was demonstrated by Alqahtani et al.40. Meta-analysis about active surveillance stated precautious results with active surveillance of PI-RADS 4 and 5, which can be related to our finding with high upgrade ratios in GG1 group41. In addition to that, our model stated that tumor size has a value for predicting upgrade after in-bore prostate biopsy, which is a novel finding. This finding, if supported by future research with larger series, may have important implications for clinical practice, including considering focal saturation in tumors with large dimensions.

Diffusion weighted imaging is a key component of mp-MRI that contributes to tumor detection, as well as to the assessment of tumor aggressiveness. Tissue microstructure such as dense cellularity or atrophic glands can result in distinct imaging findings. Hambrock et al. showed a high discriminatory performance can be achieved in the differentiation of low, intermediate, and high-grade PCa by ADC value42. In active surveillance patient group, ADC value was identified as an independent predictor of both upgrading on repeat biopsy and time to radical therapy26,43,44. Park et al. reported a significant inverse correlation between the \(\hbox {ADC}_{\text{mean}}\) and the \(\hbox {ADC}_{\text{min}}\) values and the possibility of GG upgrade in a patient group of GG145. In our study, statistical analysis revealed no significant correlation between the \(\hbox {ADC}_{\text{mean}}\) and \(\hbox {ADC}_{\text{min}}\) values and GG upgrade in both groups of patients whereas in ML studies \(\hbox {ADC}_{\text{min}}\) value was found to be useful in the prediction of GG upgrade in the \(\hbox {GG}>1\) patient group. This discrepancy between \(\hbox {ADC}_{\text{min}}\) and \(\hbox {ADC}_{\text{mean}}\) can be explained by the heterogeneous nature of PCa.

Various ML algorithms previously used for GG upgrade prediction are logistic regression18, LASSO regression18,46, SVM18,47, k-Nearest Neighbours (kNN)46, decision trees46, and random forests18,46. Due to the lack of large datasets, medical problems pose a particular challenge for ML models. Many machine learning algorithms require a considerable amount of data. Otherwise, the ML model may overfit the training data and generate poor results on the tests. For this reason, ML models such as decision tree and random forest that require massive datasets are not suitable candidates for our problem, agreed by the former studies in literature18,46. SVM, Ridge and LASSO models were used in this study as they are less prone to overfitting for relatively small datasets.

Our results show ML-assisted GG estimation accuracy was increased by 21.4% for the overall patient group, surpassing the 13.1% enhancement for upgrade estimations among \(\hbox {GG}>1\) cases, in line with the literature where Liu et al.18 showed ML application improved the prediction accuracy from 39.2% to 71.2%. These accuracy enhancements indicate ML models are useful tools to utilize clinical records for personalized treatment planning. Moreover, ML models unraveled the significance of more clinical features than revealed by statistics alone such as \(\hbox {ADC}_{\text{min}}\) (see Fig. 3b), outlining the power of ML concept, where features considered statistically insignificant can be utilized for predictive models.

The potential limitations of this study are retrospective design, small sample size that affects both statistical and ML studies, and possible increase in selection bias due to recruitment of patients over 8 years of time. Biopsy GG1 patients upgraded at RP pathology more often compared to other biopsy Gleason grade groups. The reason for this may be due to bias in data collection, as most low-risk GG1 patients are assigned to active surveillance rather than RP. Additionally, even though a 3-fold cross-validation strategy was employed in our study, an external validation is crucial for confirming the model’s effectiveness and applicability in different clinical settings. Future studies should aim to incorporate such validation to ensure the model’s reliability and utility in the clinical management of prostate cancer, enhancing its potential contribution to personalized patient care.

Our study pioneers the application of machine learning methodologies to predict upgrades in MRI-guided in-bore biopsy patients, boasting the second-largest study population, which compares MRI-guided in-bore biopsy and radical prostatectomy results48. Overall, our study suggests that a combination of clinical factors (the number of biopsy cores, the number of positive biopsy cores, Gleason grade, tumor size and \(\hbox {ADC}_{\text{min}}\) value) and machine learning models may be valuable in predicting the likelihood of GG upgrade following RP and could potentially improve patient outcomes.

Conclusion

Determining the relevant clinical variables that affect GS concordance in MRI-targeted biopsy is of utmost importance in the era of MRI pathway. Univariate statistics revealed the number of biopsy cores, number of positive biopsy cores, and Gleason grade were statistically significant GG upgrade indicators and inversely correlated to GG upgrade possibility. Machine learning analysis found the \(\hbox {ADC}_{\text{min}}\) value as a useful variable in the prediction of GG upgrade. As a novel finding, tumor size measured by mpMRI is shown to be positively correlated with GG upgrade likelihood for \(\hbox {GG}>1\) subgroup. Tumor size and \(\hbox {ADC}_{\text{min}}\) can be useful markers to assess risk of upgrade prior to biopsy, so biopsy number and patient selection for active surveillance can be decided in terms of these markers. The findings of our study contribute to identifying patients predisposed to GG upgrade during RP. By comparing patient characteristics with our documented outcomes, we can pinpoint high-risk cases for GG upgrade and potentially adjust the threshold for performing RP in such cases.