Deep learning-based image quality assessment: impact on detection accuracy of prostate cancer extraprostatic extension on MRI

Objective To assess impact of image quality on prostate cancer extraprostatic extension (EPE) detection on MRI using a deep learning-based AI algorithm. Materials and methods This retrospective, single institution study included patients who were imaged with mpMRI and subsequently underwent radical prostatectomy from June 2007 to August 2022. One genitourinary radiologist prospectively evaluated each patient using the NCI EPE grading system. Each T2WI was classified as low- or high-quality by a previously developed AI algorithm. Fisher’s exact tests were performed to compare EPE detection metrics between low- and high-quality images. Univariable and multivariable analyses were conducted to assess the predictive value of image quality for pathological EPE. Results A total of 773 consecutive patients (median age 61 [IQR 56–67] years) were evaluated. At radical prostatectomy, 23% (180/773) of patients had EPE at pathology, and 41% (131/318) of positive EPE calls on mpMRI were confirmed to have EPE. The AI algorithm classified 36% (280/773) of T2WIs as low-quality and 64% (493/773) as high-quality. For EPE grade ≥ 1, high-quality T2WI significantly improved specificity for EPE detection (72% [95% CI 67–76%] vs. 63% [95% CI 56–69%], P = 0.03), but did not significantly affect sensitivity (72% [95% CI 62–80%] vs. 75% [95% CI 63–85%]), positive predictive value (44% [95% CI 39–49%] vs. 38% [95% CI 32–43%]), or negative predictive value (89% [95% CI 86–92%] vs. 89% [95% CI 85–93%]). Sensitivity, specificity, PPV, and NPV for EPE grades ≥ 2 and ≥ 3 did not show significant differences attributable to imaging quality. For NCI EPE grade 1, high-quality images (OR 3.05, 95% CI 1.54–5.86; P < 0.001) demonstrated a stronger association with pathologic EPE than low-quality images (OR 1.76, 95% CI 0.63–4.24; P = 0.24). Conclusion Our study successfully employed a deep learning-based AI algorithm to classify image quality of prostate MRI and demonstrated that better quality T2WI was associated with more accurate prediction of EPE at final pathology. Supplementary Information The online version contains supplementary material available at 10.1007/s00261-024-04468-5.


Introduction
Beyond its role in detecting clinically significant prostate cancer, multiparametric MRI (mpMRI) plays an important role in preoperative local staging, particularly in depicting extraprostatic extension (EPE).EPE is a significant indicator of prostate cancer aggressiveness and is associated with a higher likelihood of positive surgical margins, increased rates of biochemical recurrence, and decreased overall survival following radical prostatectomy (RP) [1].Early detection of EPE on mpMRI can influence the choice of treatment and surgical approach, minimizing postoperative complications [2,3].Studies have suggested that mpMRI might outperform traditional clinical risk calculators in predicting pathological EPE [4,5].Furthermore, integrating mpMRI with clinical risk assessments enhances the accuracy of predicting pathological EPE [6].Thus, the National Cancer Institute (NCI) EPE grading system was established to standardize and to help improve EPE prediction using T2-weighted imaging (T2WI) [7].
Despite its growing value in prostate cancer diagnostic workup, mpMRI's role in local staging has faced challenges due to its moderate sensitivity and positive predictive value [8].These limitations are further exacerbated by high variability in acquisition parameters across centers, which can affect overall image quality and ultimately reader interpretation and diagnostic performances [9].The Prostate Imaging Reporting and Data System (PI-RADS) established minimum technical requirements and guidelines aimed to improve image quality and reduce variability [10].To further address these obstacles, the Prostate Imaging Quality (PI-QUAL) scoring system was introduced in 2020 [11].Since image quality is often "in the eye of the beholder," the literature reflects mixed findings on how mpMRI quality impacts the accuracy of EPE prediction [12][13][14][15].A more standardized method of assessing image quality might be useful in determining its importance in diagnosis.
Prostate MRI quality evaluations are typically conducted by radiologists using either a general assessment approach or specific criteria like PI-QUAL.However, these methods can be subject to variability due to their inherently qualitative nature, presenting a challenge in maintaining consistent standards [16,17].In this context, deep learning-based artificial intelligence (AI) emerges as a promising tool for the objective assessment of prostate MRI scans, potentially overcoming the variability of human evaluations.Recent studies have shown AI's capability in accurately assessing the quality of T2WI [18] and identifying the impact of AI-based quality evaluation on the performance of MRI targeted biopsies [19].Despite these advancements, the clinical impact of prostate MRI quality, especially AI-based evaluations, remains largely underexplored.Therefore, this study aims to investigate the impact of T2WI quality on EPE detection using a deep learning-based AI algorithm.

Patient population
This HIPAA-compliant retrospective study was approved by the Institutional Review Board, and written informed consent was obtained from all patients (ClinicalTrials.govidentifier: NCT03354416, NCT00026884, and NCT02594202).A prospectively maintained institutional database was retrospectively queried for consecutive patients who were imaged with mpMRI and subsequently underwent RP at an academic center from June 2007 to August 2022 (Fig. 1).Patients were excluded from the study if they had received previous prostate cancer treatment (N = 61), or if they were part of the initial cohort used to train the AI algorithm (N = 39).The results from a subset of 604 patients were previously published in a study that evaluated MRI-based staging in predicting biochemical recurrence of prostate cancer after RP [20].

Image acquisition and evaluation
MRI examinations were performed on two 3 T scanners (Achieva 3.0 T TX scanner or Ingenia Elition 3.0 T X, Philips Healthcare, Best, Netherlands), using a 16-channel surface coil (SENSE, Philips Healthcare, Best, Netherlands), with (n = 587) or without (n = 186) an endorectal coil (BPX-30, Medrad, PA, USA).Before imaging, each patient was instructed to undergo an enema to reduce rectal air.T2W turbo-spin-echo MRI, high b-value echo-planar diffusion-weighted imaging (DWI), and gradient recalled echo dynamic contrast-enhanced (DCE) sequences were obtained.Full image acquisition parameters are summarized in Supplemental Table 1.
From 2010 to 2022, scans were prospectively evaluated during clinical readouts by one genitourinary radiologist (B.T. with experience in prostate imaging since 2007).From 2007 to 2010, a different radiologist was responsible for interpreting the examinations and comprehensive cancer staging evaluations were not part of clinical workflow at the time.Therefore, for the current study, the aforementioned radiologist (B.T.) conducted retrospective interpretations of examinations during this period.EPE was assessed using the NCI EPE grading system [7].The 3-point grading system was defined as follows: curvilinear contact length of 1.5 cm or capsular bulge and irregularity were grade 1, the presence of both features was grade 2, and frank capsular breach was grade 3.An EPE grading score ≥ 1 was considered as positive EPE call.Only the index lesions (i.e., those with highest PI-RADS category) per patient were considered for statistical analysis.
To assess inter-reader agreement of EPE, a subtest of MRI scans was evaluated by a second genitourinary radiologist (Y.M.L. with 10 years of experience in prostate cancer imaging) from a different institution.Eighty scans were assessed, consisting of 20 scans randomly selected for each of the four NCI EPE grades (grade 0, 1, 2, 3) based on the interpretations of the first reader.The second reader was blinded to clinical and pathological details, as well as to the first reader's interpretations.

Radical prostatectomy and histopathologic evaluation
Two urologists (P.A.P with 23 years of experience or S.G. with 5 years of experience) performed RP.Each surgical specimen was reviewed by one genitourinary pathologist (M.J.M. with 45 years of experience) during clinical workflow, blinded to the mpMRI results.Histopathologic evaluation was performed at RP according to the International Society of Urological Pathology (ISUP) consensus statement [21].

T2W MR image quality assessment AI model
The previously published prostate image quality assessment AI model classified T2WI as high-quality (no quality distortions) versus low-quality (distortions present) [18].A radiologist (B.T.) evaluated T2WI quality as high-or low-quality based on general distortions (e.g., motion, noise, aliasing) and perceptual distortions (e.g., obscured delineation of the prostatic capsule, prostatic zones, external urethral sphincter, excess rectal gas).This radiologist's assessment was used as the ground truth to train the AI algorithm.The AI model can be found and accessed via the GitHub repository at: https:// github.com/ NIH-MIP/ Prost ate-MRI_ T2W_ Quali ty.

Statistical analysis
Pearson's chi-square [22] and nonparametric Wilcoxon-Mann Whitney tests [23] were conducted to examine the differences in the distribution of categorical and continuous variables, respectively.Fisher's exact tests [24] were performed to compare EPE detection metrics (i.e., sensitivity, specificity, positive predictive value [PPV], or negative predictive value [NPV]) between high-and low-quality images groups.The 95% confidence intervals (CIs) of the diagnostic metrics were obtained from 2000 bootstrap samples by random sampling on the patient-level.Receiver operating characteristic (ROC) curves were created, and the area under the ROC curve (AUC) was calculated.AUC between high-and lowquality images groups were compared using the Delong test for correlated ROC curves [25].Univariable and multivariable logistic regressions with backward variable selection based on the Akaike information criterion were applied to correlate with pathologic EPE [26].The unweighted and quadratically weighted Cohen's kappa were used to evaluate agreement between the two readers [27].Kappa values were categorized as slight (0-0.20),fair (0.21-0.40), moderate (0.41-0.60), substantial (0.61-0.80), and excellent (0.81-1).All tests were twosided and a P value of < 0.05 was considered statistically significant.Statistical analyses were performed using R software (version 4.2.1;R Foundation for Statistical Computing).

AI T2W image quality assessment
The AI algorithm classified 493 of 773 (64%) T2WI as highquality and 280 of 773 (36%) T2WI as low-quality.Examples of high-and low-quality scans are shown in Figs. 2 and  3, respectively.

Discussion
The detection of EPE, a critical indicator of prostate cancer aggressiveness, is crucial for guiding treatment decisions and surgical strategies.However, the interpretation of mpMRI and detection of EPE are not without challenges, partly due to the variability in image quality and the subjective nature of radiological assessments.AI holds the potential to support physicians in objectively and swiftly evaluating the quality of MRI scans.Thus, this study investigated the impact of T2W image quality, assessed with a previously developed AI algorithm, on the detection of EPE in patients undergoing RP.While image quality did not significantly affect sensitivity, PPV, or NPV, a notable improvement in specificity for EPE detection was observed for high-quality T2WI in NCI EPE grade ≥ 1 (72% vs. 63%, P = 0.03).Additionally, both univariable and multivariable analyses showed that NCI EPE grade 1 high-quality images demonstrated a stronger association with pathologic EPE than low-quality images.
The current literature reports a wide range of accuracy in predicting pathologic EPE, which may stem from variations in measurement metrics and modality of assessment of radiologic EPE [28][29][30].In our study, we evaluated the presence of radiologic EPE via the NCI EPE grading system, which has the benefits of simplicity and standardization.Our results on the prediction of EPE using mpMRI underscored its high specificity yet modest sensitivity.This aligns with findings from a meta-analysis which evaluated the diagnostic performance of mpMRI for identifying EPE, with a pooled sensitivity and specificity of 0.57 and 0.91, respectively [31].
Another meta-analysis investigating the NCI EPE grading system, reported a hierarchical summary AUC of 0.82 for EPE prediction [32], which is consistent with the AUC of 0.74 in our study.As prostate MR image quality is a fairly new research area, most studies have focused on the impact of PI-QUAL score on EPE prediction [12][13][14][15].In a retrospective study with 146 patients, Coelho et al. [14] found that PI-QUAL score does not affect the overall accuracy of EPE prediction.Specifically, the AUC was 0.75 for images with a PI-QUAL score of 3 or less, and 0.705 for images with a PI-QUAL score of 4 or higher.PI-QUAL score did not show correlation with EPE prediction in both univariable and multivariable analyses.Due to the limited sample size (n = 146), statistical significance for certain diagnostic measures, such as specificity, was not evaluated in their study.With a much larger study population, the current study demonstrated a statistically significant difference in specificity for NCI EPE grade ≥ 1.We also found that high-quality images were associated with higher ORs for predicting EPE across all grades.Notably, NCI EPE grade 1 with low-quality images was not a significant predictor for pathologic EPE on multivariable analysis.These findings suggest that AI-based imaging quality assessments could significantly influence patient risk stratification based on T2WI quality, thus enabling more personalized therapeutic strategies.In another retrospective study with 105 patients, Ponsiglione et al. found that specifically for EPE grade 3, accuracy was higher in studies with PI-QUAL ≥ 4 compared to with PI-QUAL < 4 (0.849 vs. 0.564, P = 0.001) [13].This contrasts with our results.However, it's important to note that PI-QUAL scoring system utilizes all mpMRI sequences, whereas the AI model we used in our study focused only on T2WI, which is recognized as the most critical anatomic pulse sequence for detecting EPE.This distinction highlights the importance of pulse sequence-specific analysis in enhancing the precision of EPE prediction.
Research regarding automated AI for evaluating the quality of prostate MR images is still emerging, with few studies available thus far [18,33,34].Nonetheless, the findings to date are encouraging.One AI model demonstrated near-perfect accuracy in its testing phase [33].The AI model used in our study achieved an accuracy of 84.7% in 1046 scans during its development phase [18].This level of accuracy establishes the potential of AI to significantly enhance the assessment of image quality, setting a solid foundation for further clinical applications.For example, a study using this AI algorithm to evaluate the impact of T2W image quality on prostate cancer detection rates found that higher quality T2WIs were associated with higher rates of clinically significant cancer detection for PI-RADS 4 lesions [19].Looking ahead, it's plausible that AI-driven image quality assessment will be seamlessly integrated into clinical and research workflows, ensuring uniform image quality.Furthermore, this AI model has the potential for real-time application during scans, offering prompt assistance to technologists in making informed decisions about the necessity of rescans [16].
Our study has some limitations.Its retrospective nature and reliance on a single institution's dataset may introduce selection bias.The interpretation of MRI scans was conducted by one radiologist, and RP and pathology assessments were performed by specialists in their respective areas.However, all of these were done as part of a clinical routine practice and not in a research manner, which mirrors the real life scenario in academic clinical practice setting.The study population consisted of patients undergoing RP, which might include different clinical or imaging characteristics from non-surgical populations.The results of the multireader analysis suggested some interobserver variability.These factors might limit the generalizability of the findings.Future research should aim to validate these results in a multicenter study, incorporating a larger and more diverse patient cohort.Of note, the high-quality T2WI group had significantly higher prostate specific antigen levels and prostate volumes compared to the low-quality group.These clinical variables could potentially influence the assessment of image quality and the evaluation of EPE.Additionally, the univariable and multivariable analyses suggested a trend of increasing ORs for predicting pathologic EPE with higher image quality.However, the wide overlap of CIs, particularly for EPE grades 2 and 3, indicates that while the ORs are higher for high-quality images, the clinical significance may require careful interpretation.Moreover, the AI model evaluated in this study is limited to assessing the quality of T2WI.In practice, radiologists may utilize additional sequences, including DWI and DCE MRI, for evaluation of EPE.Our group is actively developing AI models to evaluate the quality of these functional MRI pulse sequences.
In conclusion, this study demonstrated the significant impact of T2W image quality, assessed by an AI algorithm, on the detection of EPE in patients undergoing RP.The findings revealed that high-quality T2WI significantly improved the specificity for NCI EPE grade ≥ 1, and that NCI EPE grade 1 was associated with pathologic EPE only when high-quality images were utilized.Given the challenges in EPE detection and the variability in MRI quality, integrating AI-based image quality assessments could provide a promising solution for more tailored and standardized prostate cancer evaluations.

Fig. 1
Fig. 1 Patient flow diagram of the study.mpMRI multiparametric MRI, AI artificial intelligence

Table 1
Patient demographics and characteristics Unless otherwise specified, data are numbers of patients.Numbers in parentheses indicate the percentages ISUP International Society of Urological Pathology, mpMRI multiparametric MRI, PSA prostate-specific antigen, PI-RADS Prostate Imaging Reporting and Data Systems, NCI National Cancer Institute, RP radical prostatectomy, EPE extraprostatic extension, T2WI T2-weighted imaging *Data are median values, with IQRs in parentheses a Data are missing for 40 low-quality T2WI scans and 66 high-quality T2WI scans

Table 2
Diagnostic measures for detecting EPE Data in parentheses are 95% confidence intervals and data in brackets are numerator/denominator NCI National Cancer Institute, EPE extraprostatic extension, T2WI T2-weighted imaging, PPV positive predictive value, NPV negative predictive value, AUC area under the receiver operating characteristic curve

Table 3
Univariable and multivariable logistic regression model for pathologic EPE risk prediction CI confidence interval, PSA prostate-specific antigen, PSAD prostate-specific antigen density, PI-RADS Prostate Imaging Reporting and Data System, ISUP International Society of Urological Pathology, NCI National Cancer Institute, EPE extraprostatic extension