Ethic. The present study was approved by the scientific committee (COMIC), with protocol number 36/2021, and by the research ethics committee (CEP), with CAAE register nº 45444621.6.0000.5086, con-substantiated technical advice with ID: 4.679.671, both belonging to the University Hospital of the Federal University of Maranhão (HU-UFMA) located in the city of São Luís-MA, Brazil. To protect the privacy of these clinical data, all the ethical principles of patient rights were met, and the participant names were not used. All methods were performed in accordance with relevant guidelines and regulations. HU-UFMA authorized the study and waived informed consent form, because the data utilized were for a retrospective study without affecting patient care and approved all experiments.
Participants of the study. A total of 84 patients who are followed up in the urology sector of the HU-UFMA were included in the study. The inclusion criteria in the study were: having undergone a prostate biopsy, having a complete medical record, and being over 40 years of age.
The block diagram of the proposed method is shown in Fig. 1, below:
Data acquisition. The data acquired for this study were based on the medical records of patients registered in the HU-UFMA system, which contain sociodemographic information and clinical variables. All the patients had been submitted to prostate biopsy, as there was suspicion of cancer by urologists. A semi-structured questionnaire was used to guide the extraction of data from the medical record. Afterward, a database with all the features was created. The sample consisted of two classes: the first with positive biopsy for cancer (Ca class) and another class formed with patients with negative biopsy (Normal class). The data were collected by the urology team from the HU-UFMA and by participants in the urology academic league of the UFMA medical course.
Feature of the Selection. For this stage, some articles and the literature were reviewed. Medical specialists in the field of urology were consulted to choose the features considered most relevant for this study. Samples with missing values were removed and not considered. The characteristics used were age, race, Systemic Arterial Hypertension (SAH), Diabetes Mellitus (DM), smoking, alcoholism, DRE (prostate weight), and total PSA (tPSA).
Preprocessing. For some features, an initial parameterization was made for used as input for each machine learning model used. Table 1 shows the description used.
Table 1
Description of dataset characteristics.
Feature | Description |
Age | Years |
Race | White (1), Brown (2), Black (3), Indigenous (4) |
SAH | Yes (1), No (2) |
DM | Yes (1), No (2) |
Smoking | Yes (1), No (2), Ex (3) |
Alcoholism | Yes (1), No (2), Ex (3) |
DRE | g |
tPSA | ng/ml |
Label | Normal (0), Cancer (1) |
Classification. During the classification stage in which the samples were predicted between cancer (1) and normal (0), some machine learning techniques were used to compare the results and verify which achieved the best performance of the proposed method. The techniques used for this stage were: Support Vector Machine (SVM), Naive Bayes (NB), K-Nearest Neighbor (KNN), Decision Trees (DT), and Artificial Neural Networks (ANN).
The Support Vector Machine (SVM) [10–14] is a supervised learning method, able to classify from n observed individuals belonging to several subgroups, to which class an individual belongs. The idea of SVM is to build a hyperplane as a decision surface, in such a way that the margin of separation between the classes is the maximum possible. The goal of training through SVM is to obtain hyperplanes that divide the samples in such a way that the limits of generalization are optimized. Even when the two classes are not fully separable, the SVM can find a hyperplane using concepts belonging to the optimization theory [15].
The Naïve Bayes (NB) classifier is a probability-based classifier that works on the principle of the Bayes theorem. It is based on conditional probability and the assumption that the attributes are independent of each other. Although this assumption is not valid for practical applications, the performance of this classifier is still on par with more complex classifiers. Naïve Bayes classifiers are simple models with excellent performance. The performance of the models may be tweaked according to individual preferences based on the application. Grid search, random search, and sequential model-based optimization (SMBO) can be implemented for hyperparameter optimization [16–18].
The KNN algorithm uses “feature similarity” to predict the values of any new data points. This means that the new point is assigned a value based on how closely it resembles the points in the training set [19]. KNN is a simple and powerful non-parametric supervised method, which can be used for classification and regression. K samples that are closer to the test sample are chosen from the training dataset to classify a test sample. For classification tasks, the dominant label among the target labels of the K chosen training samples is chosen as the predicted label for the test sample [20].
DT model is a common supervised learning model and decision support tool for classification. This model classifies the data by learning simple decision rules derived from the data features. The maximum depths of the tree and minimum sample split are the parameters that need to be determined in the calibration process [21].
The ANNs are mathematical non-linear models mimicking the human brain in learning and decision-making traits, stimulating human cognitive skills. ANNs are used to map and predict outcomes in complex relationships between given 'inputs' and sought-after 'outputs' and can also be used to find patterns in datasets. ANNs can be complex with hidden layers and can be trained to represent and predict multilayer perceptions processing data with deep learning [22].
Performance metrics. In biomedical signal processing and pattern recognition, the usual performance methodology is measured by calculating some statistical measures on the test results [23]. The test classification results can be divided into True Positive (TP), False Positive (FP), True Negative (TN), and False Negative (FN). Where TP and VN are the numbers of the samples that are correctly identified, respectively, as positive, or negative by the classifier, FP and FN represent the number of samples corresponding to cases that are erroneously classified as positive or negative, respectively. Such numbers are used to generate measures capable of quantifying the performance of the methodology, assessing how efficient it is, and whether the objectives were achieved. The performance measures used in this research will be Accuracy, Specificity, Sensitivity, and AUROC.
Accuracy (Acc) is the classifier's hit rate during the test phase, and is defined by:
$$Acc=\frac{TP+TN}{TP+TN+FP+FN}$$
1
Sensitivity (Sen) is the proportion of true positives that are correctly classified by the test, and is defined by:
$$Sen=\frac{TP}{TP+FN}$$
2
Specificity (Spe) is the proportion of true negatives that are correctly classified by the test, and is defined by:
$$Spe=\frac{TN}{TN+FP}$$
3
AUROC is a way to graphically represent the relationship between sensitivity and specificity. The AUROC was estimated for the test dataset in each training process, and the mean values of AUROC were compared. The AUROC is a measure of the capability of a classifier to distinguish between classes and is utilized as a summary of the ROC curve. The greater the AUC, the better the performance of the model at discerning positive and negative classes.
To attest to the reliability of the method and the classifier, the statistical cross-validation technique 5-fold-cross validation [24] was used, where the data set is equally divided into 5 subsets, the training is carried out concatenating 4 subsets, and the classification using the remaining subset. The training and testing phases are then repeated 5 times, circularly permuting the subsets. The final accuracy is calculated using the average of the accuracies of each phase.