Prediction and Mapping of Intraprostatic Tumor Extent with Artificial Intelligence

Take Home Message A multimodal artificial intelligence (AI) model, trained to estimate prostate cancer probability in three dimensions, was retrospectively validated using prostatectomy data. AI-generated focal treatment margins outperformed conventional techniques. Furthermore, the AI model predicted negative margin probability accurately.

potentially reducing cancer recurrence rates. Furthermore, an accurate assessment of negative margin probability could facilitate informed decision-making for patients and physicians. Patient summary: Artificial intelligence was used to predict the extent of tumors in surgically removed prostate specimens. It predicted tumor margins more accurately than conventional methods.

Introduction
Focal therapy (FT) is gaining acceptance as an alternative to conventional whole-gland treatment for patients with intermediate-risk prostate cancer (PCa) [1]. Leveraging a variety of ablative technologies [2][3][4][5][6], FT has the potential to preserve quality of life while conferring metastasis-free and overall survival rates comparable with radical prostatectomy [7]. Most FT studies rely upon magnetic resonance imaging (MRI) to identify and delineate PCa foci, but MRIvisible regions of interest (ROIs) consistently underestimate the true size and extent of PCa [8][9][10]. Thus, treatment margins beyond MRI-visible tumor boundaries are critical to the success of FT.
Margins defined using conventional approaches have commonly treated the entire tumor-bearing hemisphere [2,[11][12][13]. However, hemigland ablation is often suboptimal, resulting in the treatment of large volumes of benign tissue and undertreatment of bilateral tumors [14]. Others have applied a 1-cm uniform margin around ROI(s) [15,16] despite the frequently asymmetrical nature of tumor extension [8]. Both of these strategies fail to account for patient-specific imaging, biopsy, and biomarker data that may warrant larger or smaller margins.
Artificial intelligence (AI) techniques have the potential to improve the accuracy of prostate tumor delineation [17][18][19][20][21]. Furthermore, tracked biopsy data derived from MRI-ultrasound fusion devices [22] could complement imaging data. An AI-driven approach that incorporates multimodal biopsy, imaging, and clinical features may better define treatment margins than MRI alone. In this retrospective study, we estimated the extent of clinically significant disease and the likelihood of negative surgical margins using an AI model recently cleared by the US Food and Drug Administration (''Avenda Health AI Prostate Cancer Planning Software,'' K221624). We hypothesized that tumor margins defined by AI would surpass the accuracy of conventional margins, potentially improving the outcomes of focal treatment.

Patients and methods
First, the AI model was trained using tracked biopsy data. Next, the encapsulation confidence score (ECS) was developed using a wholemount (WM) histopathology calibration dataset. Lastly, the AI model and ECS were retrospectively evaluated using a WM test dataset, which was independent from training and calibration data.

AI model training
The AI model was trained using multi-institutional, consecutively accrued biopsy data. Model input data (Fig. 1A) were multimodal, consisting of T2-weighted MRI, prostate and ROI segmentations, biopsy coordinates derived from an MRI-ultrasound fusion platform, serum prostate-specific antigen (PSA), and biopsy histopathology including cancer length and International Society of Urological Pathology (ISUP) grade.
A high-level model architecture is shown in Figure 1B. For each sample point, an eight-layer three-dimensional (3D) convolutional neural network [23] generated an image feature vector from T2-weighted MRI. Additional features were generated based on serum PSA and spatial relationships between biopsy cores, ROIs, and the prostate capsule. Spatial and image features were then concatenated and classified via a gradient-boosted decision tree (XGBoost [24]). Five-fold cross validation was used to generate five models, which were averaged to produce a single prediction between 0 and 1: the estimated probability of clinically

AI model and ECS evaluation in an independent test set
The test dataset was wholly independent of model training data; it was drawn from a different region and institution (Stanford University), a different population, and a different MRI vendor. The same inclusion criteria listed in Section 2.2 were applied to consecutively accrued radical prostatectomy cases, resulting in a test dataset of N = 50 patients.
All MRI scans were acquired at 3 Tesla; other parameters varied (Supplementary material and Supplementary Table 5). T2-weighted MRI was the only image input of the AI model, although it also incorporated ROIs that were prospectively identified using multiparametric MRI (T2-weighted, diffusion-weighted, and perfusion-weighted images)    [26]. Descriptive statistics for the WM test dataset are presented in Table 1.
CEMs were generated and thresholded to produce a set of AI model margins for each case. The model's receiver-operator characteristic was evaluated using the full spectrum of AI model margins. A single ''default'' AI margin was also produced using the patient-specific thresh- All analyses were restricted to voxels between the most apical and basal WM slide, excluding regions where ground truth was not known.
Wilcoxon signed-rank tests were used for pairwise comparisons, and chi-square tests were used to compare negative margin rates.
Kolmogorov-Smirnov tests and a linear regression curve fit were used to evaluate ECS accuracy. Only trivial code was used for statistical analyses. All metrics were measured using automated scripts written in Matlab 2020b (Mathworks, Natick, MA, USA).
Though PI-RADS ROIs with no added margin had high specificity (97.9%), mean sensitivity was very low (37.4%) and no ROI (0/50) achieved negative margins for csPCa or the index lesion. Application of 10-mm margins greatly improved ROI performance. AI margins had a lesser extent of missed csPCa (mean 1.6 vs 3.2 mm, p < 0.001) but also lower specificity (mean 51% vs 63%, p < 0.001). Sensitivity (97% vs 93%) and the negative margin rate for both csPCa  (80% vs 74%) and the index lesion (90% vs 82%) were higher for AI but not significantly different. Figures 3D-F show an exemplary comparison of AI margins with 10-mm ROI margins. Observed negative margin rates were strongly correlated with ECS predictions (Fig. 4), with an R 2 value of 0.98. The median error of ECS predictions was 4% (interquartile range 2-6%), and there were no significant differences between the observed and expected negative margin rate distributions (p = 0.97). There were likewise no significant differ-ences (p = 0.85 and p = 0.30) for ISUP grade 2 and 3 subpopulations, though predictions were more accurate for grade 2 (1% vs 5% median error).

Discussion
Precision management of PCa has the potential to optimize therapy while preserving quality of life, but targeted treatment first requires accurate tumor localization. PI-RADS  ROIs are known to underestimate tumor extent [8][9][10], and in our study, treatment of the original ROI would have resulted in positive margins for every patient. It is clear that current multiparametric MRI contouring protocols, which were developed for diagnosis, are not suitable for targeted treatment. We developed a novel AI-driven approach and platform to address this shortcoming, combining multimodal information-MRI, tracked biopsy, and PSA-to produce CEMs and define optimal margins. In the independent test set, AI margins exceeded the negative margin rate of hemigland margins for both index lesions (90% vs 66%) and any csPCa (80% vs 56%). A combination of index lesion underestimation and csPCa-bearing satellite lesions caused hemigland margins to miss csPCa in nearly half of cases. This finding is consistent with the 41-48% undetected contralateral csPCa rate reported by Johnson et al [14] and strongly suggests that, when feasible, a more patient-specific FT planning approach should be used.
AI margins decisively outperformed hemigland margins despite the nearly identical mean volume of the two approaches. However, AI was not infallible, and positive AI margins occurred most frequently (7/10) when large satellite lesions were upgraded from ISUP grade 1 on biopsy to grade 2+ on prostatectomy. The presence of MRI-visible grade 1 or multiple grade 1 biopsy cores outside a margin were risk factors for failure. Thus, due to the prevalence of multifocal csPCa and risk of positive margins, FT inclusion criteria should account for nonmicrofocal grade 1 disease foci.
The relatively strong performance of 10-mm ROI margins is consistent with the findings of Brisbane et al [22], who reported that 90% of csPCa-bearing cores were within 10 mm of an ROI. The present study was not powered for comparisons between AI and 10-mm margins; although AI margins had numerically superior sensitivity (97% vs 93%, p = 0.24) and index tumor negative margin rate (90% vs 82%, p = 0.24), these differences fell short of statistical significance. Nevertheless, a significant difference was observed in tumor extension beyond margin boundaries (mean 3.2 mm for 10-mm ROI margins vs 1.6 mm for AI margins, p = 0.001). This finding is consistent with the mean 13.5 mm of ROI tumor extent underestimation reported in prior publications [8].
The ECS, which predicts negative margin probability, was shown to be accurate across the full breadth of CEM thresholds. The correlation between predicted and observed negative margin rates was remarkably linear (R 2 = 0.98), demonstrating accuracy in a test population independent of the training data. Such a tool could be of considerable clinical value, helping inform risk assessment and therapy selection. Furthermore, simple transformation of this metric (1 -ECS) yields the probability of significant tumor outside a margin, that is, the expected rate of post-treatment residual disease. The ECS may thus allow physicians to more precisely balance the risk of residual tumor with the risk to quality of life for an individual patient.
The CEM and ECS were developed primarily to facilitate the selection of focal treatment margins, but these may have additional applications in PCa care. For example, a randomized controlled trial of external beam radiation recently showed that biochemical disease-free survival was higher when boosting dosage to the MRI-visible index lesion [27], and CEM-defined margins have the potential to further improve ''focal boosting'' performance. The CEM and ECS could also be useful when planning radical prostatectomy, since neurovascular bundles outside the AI margin could be spared during surgery. There may even be applications in active surveillance programs where the CEM could aid in sampling the MRI-invisible tumor ''penumbra'' [22], identifying patients at risk for progression and determining therapy options for patients who desire definitive treatment.
Our study had four noteworthy limitations. First, the test population was by necessity derived from a radical prostatectomy dataset, likely with larger and more advanced disease than the average FT patient. This factor was mitigated by selecting only plausible FT candidates via the study inclusion criteria. Second, the test set was derived from a single institution, which may not be representative of broader populations. However, the algorithm was trained on multi-institutional data, wholly independent from the test dataset. Third, analyses were limited to a comparison of default AI margins against the most commonly cited conventional planning methods. The performance of margins generated by actual physician readers, with and without access to the AI model, will be investigated in a separate study. Fourth, the presented algorithm incorporates only T2-weighted imaging due to limited availability of multiparametric MRI for training and test datasets. The absence of diffusion-weighted imaging is a significant limitation since it correlates strongly with tumor presence [28] and is used to delineate peripheral zone lesions under PI-RADS guidelines [25]. Future work will include incorporation of diffusion-weighted MRI, likely bolstering AI model performance relative to static methodology such as hemigland and ROI-based margins. We also plan to evaluate AI performance in additional populations, improving statistical power and yielding more definitive comparisons.

Conclusions
An AI model was developed to define margins and assess positive margin risk during FT. In a retrospective evaluation of WM prostatectomy data, the model was shown to be accurate and effective for both of these applications. This approach could help improve and standardize focal treatment margins, potentially reducing cancer recurrence rates. Furthermore, the ECS's accurate assessment of negative margin likelihood and residual tumor risk could help facilitate informed decision-making for both patients and physicians. Prospective studies are warranted, as AI-enabled cancer mapping shows considerable promise for patient-specific treatment planning and personalized medicine.
Author contributions: Richard E. Fan and Geoffrey A. Sonn had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.