Diagnostic accuracy of different computer-aided diagnostic systems for prostate cancer based on magnetic resonance imaging

Abstract Background: Computer-aided detection (CAD) system for accurate and automated prostate cancer (PCa) diagnosis have been developed, however, the diagnostic test accuracy of different CAD systems is still controversial. This systematic review aimed to assess the diagnostic accuracy of CAD systems based on magnetic resonance imaging for PCa. Methods: Cochrane library, PubMed, EMBASE and China Biology Medicine disc were systematically searched until March 2019 for original diagnostic studies. Two independent reviewers selected studies on CAD based on magnetic resonance imaging diagnosis of PCa and extracted the requisite data. Pooled sensitivity, specificity, and the area under the summary receiver operating characteristic curve were calculated to estimate the diagnostic accuracy of CAD system. Results: Fifteen studies involving 1945 patients were included in our analysis. The diagnostic meta-analysis showed that overall sensitivity of CAD system ranged from 0.47 to 1.00 and, specificity from 0.47 to 0.89. The pooled sensitivity of CAD system was 0.87 (95% CI: 0.76–0.94), pooled specificity 0.76 (95% CI: 0.62–0.85), and the area under curve (AUC) 0.89 (95% CI: 0.86–0.91). Subgroup analysis showed that the support vector machines produced the best AUC among the CAD classifiers, with sensitivity ranging from 0.87 to 0.92, and specificity from 0.47 to 0.95. Among different zones of prostate, CAD system produced the best AUC in the transitional zone than the peripheral zone and central gland; sensitivity ranged from 0.89 to 1.00, and specificity from 0.38 to 0.85. Conclusions: CAD system can help improve the diagnostic accuracy of PCa especially using the support vector machines classifier. Whether the performance of the CAD system depends on the specific locations of the prostate needs further investigation.


Introduction
Prostate cancer (PCa) is the most frequently diagnosed cancer among men in over one-half (105 of 185) of the countries of the world, notably in the Americas, Northern and Western Europe.
And it is the leading cause of cancer death among men in 46 countries, particularly in Sub-Saharan Africa and the Caribbean. [1][2][3] It is estimated that almost1.3 million new cases of PCa and 359,000 associated deaths worldwide in 2018, accounting for 7.1% of the total new cancers diagnosed worldwide, ranking as the second most frequent cancer and the fifth leading cause of cancer death in men. [1,3] Accurate and early detection of PCa can ensure that patients receive treatment immediately, which can help prevent them from further progression and metastasis, and improve their survival rate. [4] Therefore, reliable and early detection of PCa has become an important priority in the field of urologic oncology. For the past 25 years, Prostate-specific antigen has always been the gold standard for the diagnosis of PCa, followed by transrectal ultrasound -guided biopsy, which has decreased PCa related mortality by 20% to 30%, [5] this method however with substantial diagnostic errors in undersampling and understaging PCa, resulting in overtreatment related morbidity such as incontinence and impotence. [6,7] Over the past decade, multi-parametric magnetic resonance imaging (mp-MRI), has become the dominant non-invasive diagnostic tool for diagnosing and grading PCa. [8] 3 Tesla mp-MRI enables detection of 50% of all PCa lesions and 80% of clinically significant lesions. [9] However, 1 of the main limitations of the mp-MRI is that its interpretation requires experienced radiologists capable of analysing data extracted from the different MR sequences. The interpretation may lead to high inter-and intra-reader variability in diagnosis. [10] Automated and accurate PCa detection from mp-MRI sequences could minimize the time required for interpreting the images, alleviating the requirement for expertise in radiological reading, and thus reducing the risk of over-and under-treatment and enabling large-scale PCa screening.
In the past decade, several computer-aided systems [11][12][13][14] (CADs) for accurate and automated PCa detection and diagnosis have been developed. An increasing number of studies indicated that the CAD systems have the potential to support the radiologist by indicating suspicious regions and reducing oversight and perception errors [15] . In addition, some CAD applications have been shown to be time efficient [16] . However, the diagnostic test accuracy of different CAD systems is still controversial.
We therefore conducted a systematic review and meta-analysis to: (1) evaluate the diagnostic accuracy of CAD system based on MRI of the prostate and provides a malignancy assessment; (2) determine which classifier of CAD system is superior for the diagnosis of PCa; (3) determine whether the performance of the CAD system depends on the specific regions of the prostate.

Materials and methods
We conducted our systematic review and meta-analysis according to the Preferred Reporting Items for a Systematic Review and Meta-analysis of Diagnostic Test Accuracy Studies (PRISMA-DTA) guidelines. [17] The study has been registered in PROS-PERO, registration number is CRD42019132543. Ethics approval was not required for this systematic review as it involved the collection and analysis of secondary data.

Inclusion and exclusion criteria
Studies were eligible for inclusion if they fulfilled the following criteria: (1) patients 18 years of age or older and with suspicious clinical symptoms of PCa. (2) computer-aided system was used to diagnose PCa; (3) biopsy served as the reference standard; (4) study data was based on MRI.
Duplicate articles, review articles, editorials, case reports, summaries, animal and cell studies, meta-analyses, letters, editorials, comments, and other irrelevant article types were excluded.

Data sources
Cochrane library, PubMed, EMBASE, and China Biology Medicine disc were systematically searched from their inception to March 2019. We did MeSH and free texts terms searches, as follows: ("prostatic neoplasm * " OR "prostate neoplasm * " OR "prostate cancer * " OR "prostatic cancer * " OR "prostate tumor * " OR "prostatic tumor * ") AND ("artificial intelligence" OR "deep learning" OR "computer-assisted" OR "machine learning" OR "neural network * " OR "artificial inligence" OR "AI" OR "computational intelligence" OR "machine intelligence" OR "computer reasoning" OR "automated") AND ("diagnosis" OR "diagnos * " OR "detection" OR "sensitivity" OR "specificity" OR "accuracy," "positive likelihood" OR "negative likelihood" OR "ROC"). More search details are shown in Appendix1. In addition, the reference lists of identified studies were manually checked to include other potentially eligible trials. Finally, we transferred all relevant titles and abstracts to Endnote Web for selection.

Study Selection and data extraction
Two authors (X.X.P and Z.X.K) independently screened all titles and abstracts of the retrieved literatures, evaluated potentially relevant full texts, and determined eligibility. Disagreements were resolved through discussion and consensus, or by consulting a third member (L.Y.D) of the review team.
Two reviewers (X.X.P,W.H.P) independently extracted the following information: 1) Basic characteristics of included studies: first author, year of publication, country, patient numbers, patient ages, study design, prostate-specific antigen (ng/mL), testing set, reference standard; 2) The details of different CAD systems: field strength, classifier, Steps of CAD System, Imaging sequence used in system. 3) Diagnostic data: total numbers of true positives, true negatives, false positives and false negatives, as well as accuracy, sensitivity, and specificity of the CAD results.
Discrepancies were discussed and resolved by consensus with a third reviewer.

Methodological quality assessment
Two authors (X.X.P and Z.X.K) independently evaluated the methodological quality of each eligible study using Quality Assessment of Diagnosis Accuracy Study tool, a newly revised quality assessment tool developed specifically for the systematic review of diagnostic accuracy studies, [18] discrepancies were discussed and resolved by consensus, consulting a third reviewer (L.Y.D) if necessary. The tool comprises 4 domains: (1) patient selection, which describes the method for selecting patients and the patients included; (2) index test, which describes the test being studied, how it was conducted, and how the results were interpreted; (3) reference standard, which describes the reference standard test used, how it was conducted, and how the results were interpreted; and (4) flow and timing, which describe the flow of patient inclusion and exclusion and the interval between the index test and the reference standard. [19] And each question can be answered with "yes," "no", or "unclear," and the level of risk of bias can be judged as "low risk," "high risk" or "unclear risk" homologous. Finally, Review Manager 5.3 software was used to evaluate the risk of bias of each included study and draw the risk of bias' figure. developed, which provides guidance on how to translate accuracy data into a recommendation involving patient-important outcomes. [20,21] We rated the quality of the evidence using the GRADE approach, which considers 5 aspects: risk of bias, indirectness, inconsistency, imprecision, publication bias.

Statistical analyses
We first extracted the 2 Â 2 contingency table (true positives, true negatives, false positives and false negatives), some of the primary studies didn't directly give all the data in the 2 Â 2 tables, we calculated the missing data based on the existing data in the text or appendices in each primary study using the calculator in Review Manager 5.3. Using these tables, we determined the truepositive rate (TPR; sensitivity) the true-negative rate (TNR; specificity), forest plots and summary receiver operating characteristic (SROC) curves by Review Manager 5.3. And the stata12.0 software was also used to develop forest plot so as to present the sensitivity and specificity and their pooled results. SROC curves plots each study according to its sensitivity (y-axis) and specificity (x-axis), and each data point represents 1 particular study, and the area under the curve (AUC) was the final comparison indicator. The diagnostic accuracy of CAD was classified according to the AUC as excellent (0.90-1), good (0.80-0.90), fair (0.70-0.80), poor (0.60-0.70), or failed 0.50-0.60. [22] Heterogeneity in test accuracy among studies was explored by using the inconsistency index (I 2 value) and Cochran Q statistics for each forest plot, and the difference was considered significant when the P value was less than 0.05. I 2 values greater than 50% indicated substantial heterogeneity in the diagnostic parameters across studies. [23,24]

Subgroup-analyses
We conducted subgroup analyses according to the type of classifier of CAD systems used and the different prostate zone (peripheral zone, transitional zone and central gland). Publication biases was investigated by Deek funnel plot. [25] 3. Results

Quality assessment
According to the Quality Assessment of Diagnosis Accuracy Study -2 quality assessment results, 7 studies [11,13,14,27,29,30,34] enrolled consecutive samples of patients. All studies avoided inappropriate exclusions. Only 2 studies [12,26] reported the interval between the index test and reference standard. All studies used biopsy as the gold standard. The overall quality of the studies included was qualifying and satisfactory (Fig. 2). The quality of evidence is low, shown in Table 3.

Diagnostic accuracy
Ten [4,11,13,14,27,29,31,32,34,35] of the 15 included studies provided specific data to determine the 2x2 tables and were thus eligible for the meta-analysis. The remaining five studies [12,26,28,30,33] all concluded that the CAD method may assist the radiologist to detect PCa locations and could potentially guide to take the biopsy from the most aggressive part of the tumor. Our metaanalysis showed that the sensitivity of CAD system ranged from 0.47 to 1.00 and specificity from 0.47 to 0.89 between the studies. The pooled sensitivity of CAD systems was 0.87 (95% CI, 0.76-0.94) and the pooled specificity was 0.76 (95% CI, 0.62-0.85; Fig. 3). The SROC curve of CAD systems is shown in Figure 5; the AUC was 0.89 (95% CI, 0.86-0.91). Heterogeneity was high in terms of both sensitivity (I 2 = 90.3%) and specificity (I 2 = 95.8%). Subgroup analyses showed that the sensitivity of CAD systems in studies that did not specify classifiers ranged from 0.47 to 1.00, and the specificity from 0.48 to 0.88; the sensitivity of CAD systems based on artificial neural networks classifier ranged from 0.66 to 0.77, and the specificity from 0.64 to 0.92; the sensitivity of CAD using radiomic machine learning classifier was 0.96, and the specificity 0.51; and the sensitivity of CAD using SVM classifier ranged from 0.87 to 0.92, and the specificity from 0.47 to 0.95. In the peripheral zone, the sensitivity ranged from 0.66 to 1.00, and specificity from 0.48 to 0.89; in the transitional zone, sensitivity ranged from 0.89 to 1.00, and specificity from 0.38 to 0.85; and in the central gland, the sensitivity was 0.66, and specificity 0.92. (Fig. 4).

Publication bias
The P-value for Deek test in this meta-analysis was 0.47, indicating that there was no publication bias. The results of publication bias are shown in Figure 6

Discussion
Based on the available evidence, the overall sensitivity of CAD systems for diagnosing PCa ranged from 0.47 to 1.00 and specificity from 0.47 to 0.89, with an AUC of 0.89 (0.86-0.91). Subgroup analyses indicated that the SVM produced the best AUC among the CAD classifiers in the meta-analysis results. The sensitivity and specificity of CAD in the transitional zone was higher than in the peripheral zone and central gland. A review [36] published in 2014 concluded that kernel-based learning methods such as SVMs showed high sensitivity and specificity and were employed by the majority of research groups for classifying PCa from normal tissue. The conclusion was consistent with our results. But the sample size in both our study and the previous review were small, because many studies did not specify the classifier they used. In our included studies, only 3 studies reported that they used SVM. Therefore, these results are  Giannini [11] 2016 Italy retrospective study 58patients NR 58patients 64 (60-70) 6.7 (5.1-8.4) Kwak [12] 2015 USA retrospective study 244patients 108 patients 136 patients 63.32 (7.63) 9.71 (10.23) Litjens [26] 2014 Netherlands retrospective study 347 patients NR 347 patients NR NR Liu [27] 2013 USA retrospective study 54 patients 36 patients 18 patients NR NR Puech [28] 2007 France retrospective study 100patients NR 84 patients NR >=4 Roethke [13] 2016  The performance of the CAD system may dependent on the location of the tumor in the prostate, for example, central gland, peripheral zone and transitional zone. Among the included studies, only 3 studies [4,31,34] measured the sensitivity and specificity in different prostate zones, other studies did not distinguish the different prostate zones, only a set of data was reported. In our meta-analysis, we conducted subgroup analysis according to different zone of prostate, AUC curve showed that the sensitivity and specificity in the transitional zone was higher than in the peripheral zone and central gland. Therefore, future original studies should consider the influence of different prostate regions on the diagnosis results when exploring the diagnostic accuracy of CAD system for PCa.
A prerequisite of the high accuracy of CAD is the training of the classifier on a database with similar characteristics to the testing database. [36,37] An important limitation of most of the included trials was the lack of description of their classifier training, and it is unclear if this is because the training was not performed, or if it just was not reported. Even if the classifier was trained on scanner data with the same field strength (3T or 1.5T), differences in factors such as technical characteristics, coils, static magnetic field and protocols for the resonance frequency adjustment can lead to differences that could sufficiently affect the outcome. In a thorough review by Wang et al [36] numerous studies with databases varying between 15 and 100 patients were compared, not only in terms of performance but also in terms of the analyzed modalities, field strength, ground truth, the method for candidate lesion generation and applied classifier, revealing a broad heterogeneity. In our meta-analysis, the high heterogeneity among included studies may also be due to the above reasons.
Overall, CAD aided MRI is suitable for screening PCa, because multiple prostate-specific features like peripheral and transition Table 2 The details of CAD systems of included studies.

Not specified
Software design→Software design→Semiquantitative analysis Thon [29] 2017 3 T Not specified Registration→feature extraction→Feature classifier Kwak [12] 2015 3 T Not specified Postprocessing→Feature extraction→Feature selection→ Classification→ Statistical analysis Roethke [13] 2016 3T Not specified Feature vector→preparation transformation→predictor→classifier→regularisation→MAI Giannini [25] 2015 1.5 T SVM Mp-MR exam→image registration→prostate segmentation→features extraction→features selecction→SVM classifier→Voxel-wise malignancy probability map→candidate selection→FP reduction→candidate segmentation Yang [32] 2017 3 T SVM NR Artan [33] 2010 1.5T SVM Registration → Segmentation Wang [31] 2017 3T SVM NR Liu [27] 2013 3T SVM Image Post-processing (T2W2, DWI, DCE, Normalized T2W2, ADC, Ktrans)→registration→patch generation → feature extraction→lesion centroid labelled by radiologist→targeted MRI/TRUS Fusion biopsy→histopathology→SVM classifier training→SVM classifier performance testing Giannini [11] 2016 1.5 T SVM MR feature extration→voxel classification (SVM)→Post processing→pre-processing&training construction→texture feature extraction→discretization→feature selection→classifiers training (ANN)→Elements classification→voting system→post processing Vos [30] 2012  zone distinction, synchronized view of T2-w and T1-wDCEMRI images, analysis of T2-w morphological data or volume assessment were missing in other available software. [38] Additionally, in our meta-analysis, we conducted subgroup analysis according to different zones of the prostate, and the results showed that the sensitivity and specificity of CAD in the transitional zone was different from peripheral zone and central gland. Therefore, it can be explained that different parts of the human body have different sensitivity to screening tools. However, with the exception of CAD -assisted MRI, other screening tools may not recognize it because of poor sensitivity.

Strengths and limitations
Systematic reviews and meta-analyses can provide more reliable evidence than individual trials, [39,40] as their outcomes are derived from all published clinical trials, and as they can be systematically reviewed for the risk of bias. [41,42] We conducted a systemic review of CAD system based on MRI for the diagnosis of PCa using appropriate methodologies and quality assessment tools that may feed into an evidence-based clinical practice. This will be the first systematic review to directly compare the diagnostic accuracy of CAD system based on MRI to a reference standard of PCa.
Because of some limitations, the results of this meta-analysis should be carefully interpreted. First, the number of patients in some studies was relatively small, which can reduce the statistical power. Second, some studies did not report the classifier that the CAD system used, so only few studies could be included in the subgroup analyses comparing the classifiers, which may reduce the reliability of the results.

Conclusions
Our study indicated that the use of CAD systems to interpret the results of MRI had high sensitivity and specificity in diagnosing PCa. We believe that SVM should be recommended as the best classifier for the CAD system. Whether the performance of the CAD system depends on the specific zones of the prostate needs further investigation.