Method to aid the diagnosis of prostate cancer using machine learning and clinical data

doi:10.21203/rs.3.rs-2680982/v1

Download PDF

Article

Method to aid the diagnosis of prostate cancer using machine learning and clinical data

https://doi.org/10.21203/rs.3.rs-2680982/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Prostate cancer is the most common type of cancer among men and the one that causes the most deaths in the world. To start the diagnosis of prostate cancer, basically are used digital rectal examination (DRE) and prostate-specific antigen (PSA) levels. Currently, the biopsy is the only procedure able to confirm cancer, it has a high financial cost, and it is a very invasive procedure. In this research, a new method is suggested to aid in the screening of patients at risk of prostate cancer. The method was developed based on clinical variables (age, race, diabetes mellitus (DM), alcoholism, smoking, systemic arterial hypertension (SAH), DRE, and total PSA) obtained from the patient’s medical records. The method was tested using the algorithms of machine learning: Support Vector Machine (SVM), Naive Bayes (NB), K-Nearest Neighbor (KNN), Decision Trees (DT), and Artificial Neural Networks (ANN), which predicted the samples between the presence or absence of prostate cancer. The method evaluation was made by performance metrics: accuracy, specificity, sensitivity, and AUROC (area under the receiver operating characteristic). The best performance found was through the Linear SVM model, resulting in an accuracy of 86.8%, sensitivity of 88.2%, specificity of 85.3%, and AUROC of 0.90.

Biological sciences/Cancer

Health sciences/Biomarkers

Health sciences/Diseases

Health sciences/Medical research

Health sciences/Urology

Physical sciences/Engineering

Physical sciences/Mathematics and computing

Prostate cancer (PCa) is one of the most prevalent types of cancer among men worldwide, with approximately one million cases per year [1]. Additionally, its global incidence is increasing, however, more than 80% of diagnosed men still not showing metastasis [2]. In Brazil, the situation is no different, the PCa represents the second most lethal neoplasm after non-melanoma skin cancer, with an estimated 71,730 new cases in the year 2022 [3].

The diagnosis of prostate cancer is made through invasive and painful biopsies, to which men are subjected, and has a 5% chance of developing infections such as urosepsis. One of the alternatives for screening these patients is the PSA level, which started in 1986 [4]. The PSA reaches the serum when there is a rupture in the prostate gland tissue. Thus, the increase in the serum PSA level is an indication of prostate disease or trauma in the prostate gland, including prostate cancer [5].

Therefore, the elevated PSA levels do not necessarily indicate the presence of cancer, as they are elevated in other clinical situations as well. There are also cases of men with PCa, that do not have elevated PSA [4]. For this reason, prostate cancer screening using only the PSA test may imply unnecessary biopsies. Therefore, additional tests are needed that can screen for prostate cancer along with PSA.

In Brazil, PCa is also a public health problem, however, screening evaluation data tend to be limited and people with lower income have difficulty accessing the specialized health system [6]. In this context, the development of an additional screening method using clinical variables would be of great importance, since PCa's distinction from other clinical conditions is fundamental for a personalized medicine [7].

In the last years, the use of artificial intelligence (AI) to assist health professionals in decision-making has been growing [8]. Machine learning (ML) is a branch of AI that uses various optimization techniques, probability, and statistics to learn through examples, and then classify new data, predict new tendencies, or identify new patterns [7].

In urology, ML algorithms are being used in the detection of prostate cancer, correlating images with the prostate biopsy results and to the prediction of the result after the robot-assisted radical prostatectomy, for example [8]. Machine learning removes individual subjectivity and has the advantage of logically basing various methods and decisions on the data of a specific patient.

Such computer-based decision-support systems based on ML techniques have the potential to revolutionize medicine by performing complex tasks that are currently assigned to specialists to improve diagnostic accuracy, increase the efficiency of throughputs, improve clinical workflow, decrease human resource costs, and improve treatment choices [9].

These characteristics can be especially useful in prostate cancer management. Medicine must adapt to this changing world, including urologists, oncologists, radiologists, pathologists, and other health professionals [9]. Therefore, the present study aims to evaluate machine learning methods for screening patients at risk of PCa using clinical variables.

Ethic. The present study was approved by the scientific committee (COMIC), with protocol number 36/2021, and by the research ethics committee (CEP), with CAAE register nº 45444621.6.0000.5086, con-substantiated technical advice with ID: 4.679.671, both belonging to the University Hospital of the Federal University of Maranhão (HU-UFMA) located in the city of São Luís-MA, Brazil. To protect the privacy of these clinical data, all the ethical principles of patient rights were met, and the participant names were not used. All methods were performed in accordance with relevant guidelines and regulations. HU-UFMA authorized the study and waived informed consent form, because the data utilized were for a retrospective study without affecting patient care and approved all experiments.

Participants of the study. A total of 84 patients who are followed up in the urology sector of the HU-UFMA were included in the study. The inclusion criteria in the study were: having undergone a prostate biopsy, having a complete medical record, and being over 40 years of age.

The block diagram of the proposed method is shown in Fig. 1, below:

Data acquisition. The data acquired for this study were based on the medical records of patients registered in the HU-UFMA system, which contain sociodemographic information and clinical variables. All the patients had been submitted to prostate biopsy, as there was suspicion of cancer by urologists. A semi-structured questionnaire was used to guide the extraction of data from the medical record. Afterward, a database with all the features was created. The sample consisted of two classes: the first with positive biopsy for cancer (Ca class) and another class formed with patients with negative biopsy (Normal class). The data were collected by the urology team from the HU-UFMA and by participants in the urology academic league of the UFMA medical course.

Feature of the Selection. For this stage, some articles and the literature were reviewed. Medical specialists in the field of urology were consulted to choose the features considered most relevant for this study. Samples with missing values were removed and not considered. The characteristics used were age, race, Systemic Arterial Hypertension (SAH), Diabetes Mellitus (DM), smoking, alcoholism, DRE (prostate weight), and total PSA (tPSA).

Preprocessing. For some features, an initial parameterization was made for used as input for each machine learning model used. Table 1 shows the description used.

Table 1

Description of dataset characteristics.
Feature	Description
Age	Years
Race	White (1), Brown (2), Black (3), Indigenous (4)
SAH	Yes (1), No (2)
DM	Yes (1), No (2)
Smoking	Yes (1), No (2), Ex (3)
Alcoholism	Yes (1), No (2), Ex (3)
DRE	g
tPSA	ng/ml
Label	Normal (0), Cancer (1)

Classification. During the classification stage in which the samples were predicted between cancer (1) and normal (0), some machine learning techniques were used to compare the results and verify which achieved the best performance of the proposed method. The techniques used for this stage were: Support Vector Machine (SVM), Naive Bayes (NB), K-Nearest Neighbor (KNN), Decision Trees (DT), and Artificial Neural Networks (ANN).

The Support Vector Machine (SVM) [10–14] is a supervised learning method, able to classify from n observed individuals belonging to several subgroups, to which class an individual belongs. The idea of SVM is to build a hyperplane as a decision surface, in such a way that the margin of separation between the classes is the maximum possible. The goal of training through SVM is to obtain hyperplanes that divide the samples in such a way that the limits of generalization are optimized. Even when the two classes are not fully separable, the SVM can find a hyperplane using concepts belonging to the optimization theory [15].

The Naïve Bayes (NB) classifier is a probability-based classifier that works on the principle of the Bayes theorem. It is based on conditional probability and the assumption that the attributes are independent of each other. Although this assumption is not valid for practical applications, the performance of this classifier is still on par with more complex classifiers. Naïve Bayes classifiers are simple models with excellent performance. The performance of the models may be tweaked according to individual preferences based on the application. Grid search, random search, and sequential model-based optimization (SMBO) can be implemented for hyperparameter optimization [16–18].

The KNN algorithm uses “feature similarity” to predict the values of any new data points. This means that the new point is assigned a value based on how closely it resembles the points in the training set [19]. KNN is a simple and powerful non-parametric supervised method, which can be used for classification and regression. K samples that are closer to the test sample are chosen from the training dataset to classify a test sample. For classification tasks, the dominant label among the target labels of the K chosen training samples is chosen as the predicted label for the test sample [20].

DT model is a common supervised learning model and decision support tool for classification. This model classifies the data by learning simple decision rules derived from the data features. The maximum depths of the tree and minimum sample split are the parameters that need to be determined in the calibration process [21].

The ANNs are mathematical non-linear models mimicking the human brain in learning and decision-making traits, stimulating human cognitive skills. ANNs are used to map and predict outcomes in complex relationships between given 'inputs' and sought-after 'outputs' and can also be used to find patterns in datasets. ANNs can be complex with hidden layers and can be trained to represent and predict multilayer perceptions processing data with deep learning [22].

Performance metrics. In biomedical signal processing and pattern recognition, the usual performance methodology is measured by calculating some statistical measures on the test results [23]. The test classification results can be divided into True Positive (TP), False Positive (FP), True Negative (TN), and False Negative (FN). Where TP and VN are the numbers of the samples that are correctly identified, respectively, as positive, or negative by the classifier, FP and FN represent the number of samples corresponding to cases that are erroneously classified as positive or negative, respectively. Such numbers are used to generate measures capable of quantifying the performance of the methodology, assessing how efficient it is, and whether the objectives were achieved. The performance measures used in this research will be Accuracy, Specificity, Sensitivity, and AUROC.

Accuracy (Acc) is the classifier's hit rate during the test phase, and is defined by:

$$Acc=\frac{TP+TN}{TP+TN+FP+FN}$$

Sensitivity (Sen) is the proportion of true positives that are correctly classified by the test, and is defined by:

$$Sen=\frac{TP}{TP+FN}$$

Specificity (Spe) is the proportion of true negatives that are correctly classified by the test, and is defined by:

$$Spe=\frac{TN}{TN+FP}$$

AUROC is a way to graphically represent the relationship between sensitivity and specificity. The AUROC was estimated for the test dataset in each training process, and the mean values of AUROC were compared. The AUROC is a measure of the capability of a classifier to distinguish between classes and is utilized as a summary of the ROC curve. The greater the AUC, the better the performance of the model at discerning positive and negative classes.

To attest to the reliability of the method and the classifier, the statistical cross-validation technique 5-fold-cross validation [24] was used, where the data set is equally divided into 5 subsets, the training is carried out concatenating 4 subsets, and the classification using the remaining subset. The training and testing phases are then repeated 5 times, circularly permuting the subsets. The final accuracy is calculated using the average of the accuracies of each phase.

In this section, the results obtained are presented. The proposed method was implemented using the MatLab program, version R2022a, along with an intel(R) core (TM) i5-7200U CPU @ 2.50GHz and 8 GB of RAM.

Retrospective observational research was carried out through the medical records of 84 patients (42 without prostate cancer and 42 with prostate cancer) from the Urology sector of the HU-UFMA, and a dataset was created from them. Eight characteristics were used (Age, Race, Diabetes Mellitus (DM), Alcoholism, Smoking, Systemic Arterial Hypertension (SAH), DRE, and total PSA), which served as input for each ML technique used in this research. A feature named 'Label' was created, which identifies whether the biopsy result was positive (Gleason > 0), i.e., the patient has prostate cancer, or if the result was negative (normal), i.e., the patient does not have prostate cancer.

The median age of patients at the time of the diagnosis was 68 years (range: 49–92). Most patients were brown (77.38%). The DRE (prostate weight) average was 60g (range: 20–100) and the total average of the PSA was 6.535 ng/ml (range: 0.5–78). Regarding the frequency, were found 46 patients with positive SAH and 38 negatives. Positive DM: 20 and negative: 64, Positive smoking: 7, negative: 48 and, Ex: 29, Positive alcoholism: 19, negative: 46 and, Ex: 19, as shown in Table 2.

Table 2

Details of dataset characteristics with their respective means, std, medians, range, and frequencies.
Feature	Mean or Percent (%)	Std	Median	Range	Frequency
Feature	Mean or Percent (%)	Std	Median	Range	Yes	No	Ex
Age	67.08	8.35	68	49–92	-	-	-
Race	-	-	-	-	-	-	-
White	16.66%	-	-	-	-	-	-
Brown	77.38%	-	-	-	-	-	-
Black	4.76%	-	-	-	-	-	-
Indigenous	1.20%	-	-	-	-	-	-
SAH	-	-	-	-	46	38	-
DM	-	-	-	-	20	64	-
Smoking	-	-	-	-	7	48	29
Alcoholism	-	-	-	-	19	46	19
DRE	58.27	20.43	60	20–100	-	-	-
tPSA	10.92	13.58	6.535	0.5–78	-	-	-

One of the simplest ways to begin to understand how dataset variables are related is through a correlation matrix. For each input characteristic of the dataset, it will be shown how it relates to the output characteristic (Label). The closer to 1 there is the more correlation between the input variables. Figure 2 shows the dataset correlation matrix.

One can see, in Fig. 3, the scatterplot for the input data for patients with and without cancer. The diagonal represents a univariate distribution is drawn to show the marginal distribution of the data in each column.

Considering the distribution of the characteristics per class (cancer and normal) some relevant information was obtained. Regarding the patients with cancer, the following information was obtained, concerning the mean, Age: 68 years, DRE: 40 g, and PSA: 8.62 ng/ml. Regarding the frequency, SAH: 59.52% (yes) and 40.48% (no), DM: 28.57% (yes) and 71.43% (no), smoking: 9.52% (yes), 59.52% (no) and 30.95% (ex), alcoholism: 35.71% (yes), 54.76% (no) and 9.52% (ex), and race: 9.52% (white), 83.33% (brown), 7.14% (black) and 0% (indigenous). Regarding the patients with a negative biopsy diagnosis (normal), there are, on average, age: 69 years, DRE: 80 g, and PSA: 5.85 ng/ml. Regarding frequency, SAH: 50% (yes) and 50% (no), DM: 19.05% (yes) and 80.95% (no), smoking: 7.14% (yes), 54.76% (no) and 35.71% (ex), alcoholism: 9.52% (yes), 54.76% (no) and 35.71% (ex), and race: 23.81% (white), 71.43% (brown), 2.38% (black) and 2.38% (indigenous). This information can be shown in Table 3.

Table 3

Distribution per class of average and frequencies of dataset characteristics.
Feature	Cancer	Normal	Cancer (%)	Normal (%)
Age (median)	68	69	-	-
DRE (median)	40	80	-	-
tPSA (median)	8.62	5.85	-	-
SAH (yes)	25	21	59,52%	50,00%
SAH (no)	17	21	40,48%	50,00%
DM (yes)	12	8	28,57%	19,05%
DM (no)	30	34	71,43%	80,95%
Smoking (yes)	4	3	9,52%	7,14%
Smoking (no)	25	23	59,52%	54,76%
Smoking (ex)	13	16	30,95%	38,10%
Alcoholism (yes)	15	4	35,71%	9,52%
Alcoholism (no)	23	23	54,76%	54,76%
Alcoholism (ex)	4	15	9,52%	35,71%
Race (White)	4	10	9,52%	23,81%
Race (Brown)	35	30	83,33%	71,43%
Race (Black)	3	1	7,14%	2,38%
Race (Indígenous)	0	1	0,00%	2,38%

Parameterization was performed on the data for each characteristic. Table 4 shows part (10 of the 84 patients) of the collected samples, already with the initial parameterizations performed.

Table 4

Part of the Dataset with 10 samples out of the 84 collected.
Age	Race	SAH	DM	Smoking	Alcoholism	DRE	tPSA	Label
73	1	2	2	1	1	40	6.14	1
63	3	1	2	2	2	44.7	10.71	1
70	2	1	2	2	1	60	5.91	1
49	3	1	1	2	2	40	5.45	1
73	2	1	2	3	2	65	5.85	1
61	2	2	1	3	1	80	1.66	0
62	2	2	2	2	2	80	10.90	0
65	2	2	2	2	2	80	63.68	0
74	3	2	2	2	2	80	9.79	0
68	2	2	2	2	2	60	11.49	0

Some machine learning algorithms (SVM, NB, KNN, DT, and ANN) were used to predict the samples in two categories (cancer or normal). For the SVM model, the kernel type (linear, polynomial, or gaussian), regularization parameter C (from 0.5 to 3), and kernel scale were tested. For the NB model, the parameters used were kernel function and support; for the KNN model, the number of neighbors (from 3 to 19) and the distant weight were tested. For the DT model, the parameters used were depth, leaf, max num split, split criterion, and the number of variables to sample. For the ANN model, various combinations of the number of layers and learning rate were investigated for hyperparameter tuning, in addition to the activation function and lambda.

All models were created with a balanced class weight. To test the predictive accuracy of the model, the dataset was randomly split into 80% training and 20% testing subsets, in addition to 5-fold cross-validation. According to the results shown in Table 5, SVM had the best performance with an accuracy of 86.8%, sensitivity of 88.2%, specificity of 85.3%, and AUROC of 0.90 (0.826–0.983). The area under the ROC curve corresponds to 95% confidence intervals (CI). The parameters used by the SVM were C = 1, kernel function: 'linear', standardize data: 'true', and kernel scale: ‘auto'.

Table 5

Classification results obtained by each model used.
Model	Parameters	Acc	Sen	Spe	AUROC
SVM	C = 1, kernel_function = ‘linear’, kernel_scale = ‘auto’, standardize_data = ‘true’	0.868	0.882	0.853	0.90 (0.826–0.983)
NB	Kernel_function = ‘Gaussian’, support = ‘unbounded’	0.779	0.882	0.676	0.81 (0.688–0.932)
KNN	Preset: ‘cosine KNN’, number_of_neighbors = 10, distance_metric = ‘cubic, distance_weight = ‘equal’, standardize data = ‘true’	0.779	0.706	0.853	0.82 (0.707–0.933)
DT	Preset = ‘Coarse Tree’, maximum_number_of_split = 4, split_cliterion = ‘Gini’s diversity index’, surrogate_decission_split = ‘off’	0.750	0.824	0.676	0.77 (0.637–0.903)
ANN	Preset = ‘Bilayered Neural Network’, first_layer_size = 10, second_layer_size = 10, activation_function = ‘ReLU’, iteration_limit = 1000, standardize_data = ‘true’	0.765	0.706	0.824	0.82 (0.706–0.934)

Figure 4 shows the AUC generated by the best-performing model.

In the present study, various ML algorithms were used to predict and classify individuals into two categories: cancer and normal. Some evaluation criteria were used to evaluate the prediction performance of the algorithms. In the classification stage, the Linear SVM model performed better with AUROC 0.90.

Numerous studies have been performed previously on diagnosing prostate cancer using AI. However, few studies have analyzed clinical variables in screening for prostate cancer biopsy, without considering imaging tests. In a study published in 2021 by Yingna Chen et. al. [25], they used photoacoustic spectroscopy and machine learning to identify prostate cancer. The LDA accuracy rate was 76.3% and the QDA accuracy rate reached 81.7%. In another study published in 2020 by Adil LAABIDI and Mohammed AISSAOUI [26] a case study was carried out involving the prediction of prostate cancer using eight (KNN, RNN, SVM, Naive Bayes, Decision Tree, Random Forest, Logistic Regression, and XGBoost) different machine learning techniques to investigate their accuracy and performance. The characteristics used were: Radius, Texture, Perimeter, Area, Smoothness, Compactness, Diagnosis result, Symmetry, and Fractal dimension. They achieved the best result 81% accuracy when classified with ANN.

Singhal et al. [27] published in 2022 and proposed a deep learning approach to segment and classify prostate cancer using a new training methodology that learns domain-agnostic features. They used 3741 biopsies for training, used the ROC methodology in the model evaluation, and obtained an accuracy of 89.4% for images in the internal group. They tested 425 new biopsies in an external set of 1201 images and achieved an accuracy of 85.3%, and they performed a blind validation with a third institution and achieved an accuracy of 83.1%. Jen H.H. et al. [28] used 20,796 samples, and the homogeneous Markov model with two exponential distributed time durations was developed to screen the biopsy through PSA and obtained an AUROC of 0.77. Liu, J. et al. [29] developed multivariate models to predict PCa among patients in the PSA gray zone. The models outperformed mpMRI examination and other single clinical parameters (Age, tPSA, fPSA, PV, f/tPSA, PSAD, mpMRI) for predicting PCa, the diagnostic performance of mpMRI (AUC = 0.69).

The strengths of this study included the identification of variables directly associated with an increased risk of prostate cancer, high performance using simple ML algorithms even with few features (eight) and few samples (eighty-four), use of some clinical variables (Age, Race, Diabetes Mellitus (DM), Alcoholism, Smoking, Systemic Arterial Hypertension (SAH), DRE and total PSA), and not just PSA and/or DRE and/or imaging tests to aid in prostate cancer screening, low cost to aid in the diagnosis of cancer, once the method serves as screening for biopsy.

However, the present study has limitations, as some features were not included in the medical records of all patients, which means that they could not be used, such as body mass index (BMI), Benign prostatic hyperplasia (BPH), family history, vasectomy, and hypercholesterolemia. Another limitation was that several samples were excluded because they did not have complete data regarding the eight input variables used in this study. Furthermore, the data from only one hospital were included in this study and, as such, may not represent the entire population.

Prostate cancer has a high prevalence worldwide and causes many deaths. The use of ML techniques to assist in the screening patients at risk for prostate cancer along with clinical variables can help the patient save money and reduce the burden on health systems, in addition to assisting the physician in making decisions about whether to request additional tests and/or prostate biopsy. The results of our study showed that using the linear SVM model there was high accuracy, even using only eight clinical variables in the classification of individuals at risk of prostate cancer. Our study also demonstrated that ML algorithms have acceptable performance and that they can be incorporated into clinical practice, as doctors and patients can easily reap the benefits of the proposed method.

Data availability

The dataset request analyzed in this study should be addressed to W.B.D.A.

Acknowledgements

The authors would like to thank the University Hospital of the Federal University of Maranhão (HU-UFMA) for providing the patient’s medical records.

Author contributions

Intellectual concept and design of the research, W.B.D.A., E.E.C.S., N.P.S.S., C.M.S.J. and A.K.D.B.F.; Acquisition of data, G.L.M., J.A.L.M.S., P.L.A.L., W.N.S., J.P.P.G. and F.C.B.R.S; Analysis of data, W.B.D.A., E.E.C.S., N.P.S.S., C.M.S.J. and A.K.D.B.F.; Interpretation of data, all authors; Original writing, W.B.D.A. and N.P.S.S.; Critical editing of written text, E.E.C.S., C.M.S.J. and A.K.D.B.F.; Approval of the definitive version of the manuscript, all authors; Accountable for legal, ethical and all other aspects of the work, all authors.

Funding

This research is part of the Doctoral Thesis of the first author of the Graduate Program in Electrical Engineering at UFMA. This research received no external funding.

Competing interests

The authors declare no competing interests.

Additional information

Correspondence and requests for materials should be addressed to W.B.D.A.

Kim, M. H., Yoo, S., Choo, M. S. et al. The role of the serum 25-OH vitamin D level on detecting prostate cancer in men with elevated prostate-specific antigen levels. Sci Rep, 12, 14089 https://doi.org/10.1038/s41598-022-17563-8 (2022).
Lee, C., Light, A., Alaa A. et al. Application of a novel machine learning framework for predicting non-metastatic prostate cancer-specific mortality in men using the Surveillance, Epidemiology, and End Results (SEER) database. The Lancet Digital Health, 3, 158-165 https://doi.org/10.1016/S2589-7500(20)30314-9 (2021).
National Cancer Institute of Brazil (INCA), prostate cancer statistics. INCAhttps://www.gov.br/inca/pt-br/assuntos/cancer/tipos/prostata (2022)
Cosma, G., McArdle, S. E., Foulds, G. A. et al. Prostate Cancer: Early Detection and Assessing Clinical Risk Using Deep Machine Learning of High Dimensional Peripheral Blood Flow Cytometric Phenotyping Data. Front Immunol 12, https://doi.org/10.3389/fimmu.2021.786828 (2021).
Karunasinghe, N., Minas, T. Z., Bao, B. Y. et al. Assessment of factors associated with PSA level in prostate cancer cases and controls from three geographical regions. Sci Rep12, 55 https://doi.org/10.1038/s41598-021-04116-8 (2022).
Mori, R. R., Faria, E. F., Mauad, E. C. et al. Prostate cancer screening among elderly men in Brazil: should we diagnose or not? Int Braz J Urol. 2020 46(1), 34-41 https://doi.org/10.1590/S1677-5538.IBJU.2019.0022 (2020).
Wang, X., Yang, W., Weinreb, J. et al. Searching for prostate cancer by fully automated magnetic resonance imaging classification: deep learning versus non-deep learning. SciRep7, 15415 https://doi.org/10.1038/s41598-017-15720-y (2017).
Secasan, C. C., Onchis, D., Bardan, R. et al. Artificial Intelligence System for Predicting Prostate Cancer Lesions from Shear Wave Elastography Measurements. Curr. Oncol.29, 4212-4223 https://doi.org/10.3390/curroncol29060336 (2022).
Goldenberg, S., Nir, G. & Salcudean, S. E. A new era: artificial intelligence and machine learning in prostate cancer. Nat Rev Urol16, 391–403 https://doi.org/10.1038/s41585-019-0193-3 (2019).
Cortes, C. & Vapnik, V. Support-vector networks. Mach Learn20, 273–297 https://doi.org/10.1007/BF00994018 (1995).
Liang, B., Liu, Z. & Niu, Y. B. Shearer cutting pattern recognition based on multi-scale fuzzy entropy and support vector machine. Earth Environ. Sci.692, 042062 https://doi.org/10.1088/1755-1315/692/4/042062 (2021).
Hazarika, B. B. & Gupta, D. Density-weighted support vector machines for binary class imbalance learning. Neural Comput & Applic. 33, 4243–4261 https://doi.org/10.1007/s00521-020-05240-8 (2021).
Essam, Y., Huang, Y.F., Ng, J.L. et al. Predicting streamflow in Peninsular Malaysia using support vector machine and deep learning algorithms. Sci Rep12, 3883 https://doi.org/10.1038/s41598-022-07693-4 (2022).
Hazarika, B. B. & Gupta, D. Density weighted twin support vector machines for binary class imbalance learning. Neural Process. Lett.54, 1091–1130 https://doi.org/10.1007/s11063-021-10671-y (2022).
Ding, C. & Peng, H. Minimum Redundancy Feature Selection from Microarray Gene Expression Data. J Bioinform Comput Biol3(2), 185-205 https://doi.org/10.1142/s0219720005001004 (2005).
Erickson, B. J., Korfiatis, P., Akkus, Z. & Kline, T. L. Machine learning for medical imaging. Radiographics,37, 505–515, https://doi.org/10.1148/rg.2017160130 (2017).
Sartias, M. M. & Yasar, A. Performance analysis of ANN and Naïve Bayes classification algorithm for data classification. Int. J. Intell. Syst. Appl. Eng., IJISAE 7, 88–91 https://doi.org/10.18201/ijisae.2019252786 (2019).
Gandhi, R. Naïve Bayes classifier. Towards data science. https://towardsdatascience.com/naive-bayes-classifier-81d512f50a7c (2018).
Cui, L., Zhang, Y., Zhang, R. & Liu, Q. H. A modified efficient KNN method for antenna optimization and design. IEEE Trans. Antennas Propag.68(10), 6858–6866, https://doi.org/10.1109/TAP.2020.3001743 (2020).
Khalili, H., Rismani, M., Nematollahi, M.A. et al. Prognosis prediction in traumatic brain injury patients using machine learning algorithms. Sci Rep13, 960 https://doi.org/10.1038/s41598-023-28188-w (2023).
Xing, L., He J., Li, Y. et al. Comparison of different models for evaluating vehicle collision risks at upstream diverging area of toll plaza. Accid.Anal. Prev. 135, 105343 https://doi.org/10.1016/j.aap.2019.105343 (2020).
Hossain, M.D., Kabir, M.A., Anwar, A. et al. Detecting autism spectrum disorder using machine learning techniques. Health Inf Sci Syst9, 17, https://doi.org/10.1007/s13755-021-00145-9 (2021).
Bushberg, J. T., Seibert, A. J., Leidholdt, E. M. et al. The Essential Physics of Medical Imaging, third ed., Lippincott Williams & Wilkins, Philadelphia, PA, 40, issue 7 https://doi.org/10.1118/1.4811156 (2012).
Kohavi, R. A study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. In: International joint Conference on artificial intelligence, 2, 1137-1145, https://dl.acm.org/doi/10.5555/1643031.1643047 (1995).
Chen, Y., Xu, C., Zhang, Z. et al. Prostate cancer identification via photoacoustic spectroscopy and machine learning. ElsevierPhotoacoustics, 23, 100280 https://doi.org/10.1016/j.pacs.2021.100280 (2021).
Laabidi A. & Aissaoui, M. Performance analysis of Machine learning classifiers for predicting diabetes and prostate cancer, 1st International Conference on Innovative Research in Applied Science, Engineering and Technology (IRASET), pp. 1-6, IEEE Explorehttps://doi.org/10.1109/IRASET48871.2020.9092255 (2020).
Singhal, N., Soni, S., Bonthu, S. et al. A deep learning system for prostate cancer diagnosis and grading in whole slide images of core needle biopsies. Sci Rep12, 3383. https://doi.org/10.1038/s41598-022-07217-0 (2022).
Jen, HH., Chang, WJ., Hsu, CY. et al. Sojourn-time-corrected receiver operating characteristic curve (ROC) for prostate specific antigen (PSA) test in population-based prostate cancer screening. Sci Rep10, 20665 https://doi.org/10.1038/s41598-020-77668-w (2020).
Liu, J., Dong, B., Qu, W. et al. Using clinical parameters to predict prostate cancer and reduce the unnecessary biopsy among patients with PSA in the gray zone. Sci Rep10, 5157 https://doi.org/10.1038/s41598-020-62015-w (2020).

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

Method to aid the diagnosis of prostate cancer using machine learning and clinical data

Status:

Version 1

Abstract

Figures

Introduction

Materials And Methods

Results

Discussion

Conclusion

Declarations

References

Additional Declarations

Status:

Version 1