Abstract
Quite often, the available pre-biopsy data for early prostate cancer detection are imbalanced. When the least squares support vector machines (LS-SVMs) are applied to such scenarios, it becomes naturally desirable for us to introduce the well-known AUC performance index into the LS-SVMs framework to avoid bias towards majority classes. However, this may result in high computational complexity for the minimal leave-one-out error. In this paper, by introducing the parameter \(\lambda \), a generalized Area under the ROC curve (AUC) performance index \(R_{AUCLS}\) is developed to theoretically guarantee that \(R_{AUCLS}\) linearly depends on the classical AUC performance index \(R_{AUC}\). Based on both \(R_{AUCLS}\) and the classical LS-SVM, a new AUC-based least squares support vector machine called AUC-LS-SVMs is proposed for directly and effectively classifying imbalanced prostate cancer data. The distinctive advantage of the proposed classifier AUC-LS-SVMs exists in that it can achieve the minimal leave-one-out error by quickly optimizing the parameter \(\lambda \) in \(R_{AUCLS}\) using the proposed fast leave-one-out cross validation (LOOCV) strategy. The proposed classifier is first evaluated using generic public datasets. Further experiments are then conducted on a real-world prostate cancer dataset to demonstrate the efficacy of our proposed classifier for early prostate cancer detection.
Similar content being viewed by others
References
Cancer stat facts: prostate cancer. https://seer.cancer.gov/statfacts/html/prost.html. Accessed 30 Apr 2018
From development to use in clinical practice - ERSPC prostate cancer risk calculator. http://www.prostatecancer-riskcalculator.com/from-development-to-use-in-clinical-practice-erspc-prostate-cancer-risk-calculator. Accessed 30 Apr 2018
LIBSVM data: classification (binary Class). https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html. Accessed 30 Apr 2018
UCI machine learning repository. https://archive.ics.uci.edu/ml/datasets.html. Accessed 30 Apr 2018
(2004) Optimising area under the ROC curve using gradient descent. In: Proceedings of the Twenty-first international conference on machine learning, ACM, p 49
Ablin R, Pfeiffer L, Gonder M, Soanes W (1968) Precipitating antibody in the sera of patients treated cryosurgically for carcinoma of the prostate. Exp Med Surg 27(4):406–410
Artan Y, Haider MA, Langer DL, Van der Kwast TH, Evans AJ, Yang Y, Wernick MN, Trachtenberg J, Yetik IS (2010) Prostate cancer localization with multispectral mri using cost-sensitive support vector machines and conditional random fields. IEEE Trans Image Process 19(9):2444–2455
Brefeld U, Scheffer T (2005) AUC maximizing support vector learning. In: Proceedings of the international conference on machine learning (ICML) 2005 workshop on ROC analysis in machine learning
Calders T, Jaroszewicz S (2007) Efficient AUC optimization for classification. In: European conference on principles of data mining and knowledge discovery, Springer, pp 42–53
Catalona W, Hudson M, Scardino P, Richie J, Ahmann F, Flanigan R, DeKernion J, Ratliff T, Kavoussi L, Dalkin B (1994) Selection of optimal prostate specific antigen cutoffs for early detection of prostate cancer: receiver operating characteristic curves. J Urol 152(6 Pt 1):2037–2042
Catalona W, Richie J, Ahmann F, Hudson M, Scardino P, Flanigan R, Dekernion J, Ratliff T, Kavoussi L, Dalkin B (1994) Comparison of digital rectal examination and serum prostate specific antigen in the early detection of prostate cancer: results of a multicenter clinical trial of 6,630 men. J Urol 151(5):1283–1290
Cawley GC (2006) Leave-one-out cross-validation based model selection criteria for weighted ls-svms. In: The 2006 IEEE international joint conference on neural network proceedings, IEEE, pp 1661–1668
Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):27
Chawla NV, Japkowicz N, Kotcz A (2004) Special issue on learning from imbalanced data sets. ACM SIGKDD Explor Newsl 6(1):1–6
Çınar M, Engin M, Engin EZ, Ateşçi YZ (2009) Early prostate cancer diagnosis by using artificial neural networks and support vector machines. Expert Syst Appl 36(3):6357–6361
Cortes C, Mohri M (2004) AUC optimization vs. erlror rate minimization. In: advances in neural information processing systems, pp 313–320
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Elkan C (2001) The foundations of cost-sensitive learning. In: International joint conference on artificial intelligence, Lawrence Erlbaum Associates Ltd, vol 17, pp 973–978
Gao W, Jin R, Zhu S, Zhou ZH (2013) One-pass AUC optimization. In: International conference on machine learning, pp 906–914
Gao W, Zhou ZH (2015) On the consistency of AUC pairwise optimization. In: International joint conference on artificial intelligence (IJCAI), pp 939–945
Ghazikhani A, Monsefi R, Yazdi HS (2014) Online neural network model for non-stationary and imbalanced data stream classification. Int J Mach Learn Cybern 5(1):51–62
Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1):29–36
Holst A et al (2008) Efficient AUC maximization with regularized least-squares. In: Tenth Scandinavian conference on artificial intelligence: SCAI 2008, IOS Press, vol 173, p 12
Joachims T (2005) A support vector method for multivariate performance measures. In: Proceedings of the 22nd international conference on machine learning, ACM, pp 377–384
Krawczyk B (2016) Learning from imbalanced data: open challenges and future directions. Prog Artif Intell 5(4):221–232
Lee W, Jun CH, Lee JS (2017) Instance categorization by support vector machines to adjust weights in AdaBoost for imbalanced data classification. Inf Sci 381:92–103
Li S, Zhang Y, Xu J, Li L, Zeng Q, Lin L, Guo Z, Liu Z, Xiong H, Liu S (2014) Noninvasive prostate cancer screening based on serum surface-enhanced raman spectroscopy and support vector machine. Appl Phys Lett 105(9):091104
Liu Y (2004) Active learning with support vector machine applied to gene expression data for cancer classification. J Chem Inf Comput Sci 44(6):1936–1941
Mao W, Wang J, Xue Z (2017) An ELM-based model with sparse-weighting strategy for sequential data imbalance problem. Int J Mach Learn Cybern 8(4):1333–1345
Nadji M, Tabei SZ, Castro A, Chu TM, Murphy GP, Wang MC, Morales AR (1981) Prostatic-specific antigen: an immunohistologic marker for prostatic neoplasms. Cancer 48(5):1229–1232
Rakotomamonjy A (2004) Optimizing area under ROC curve with SVMs. In: ROCAI, pp 71–80
Rezvani S, Wang X, Pourpanah F (2019) Intuitionistic fuzzy twin support vector machines. IEEE Trans Fuzzy Syst 27(11):2140–2151
Riedel KS (1992) A Sherman-Morrison-Woodbury identity for rank augmenting matrices with application to centering. SIAM J Matrix Anal Appl 13(2):659–662
Suykens J, Van Gestel T, De Brabanter J, De Moor B, Vandewalle J (2002) Least squares support vector machine classifiers. World Scientific, Singapore
Vapnik VN (1999) An overview of statistical learning theory. IEEE Trans Neural Netw 10(5):988–999
Wang G, Lu J, Choi KS, Zhang G (2018) A transfer-based additive LS-SVM classifier for handling missing data. IEEE Trans Cybern 50(2):739–752
Wang G, Zhang G, Choi K, Lu J (2019) Deep additive least squares support vector machines for classification with model transfer. IEEE Trans Syst Man Cybern Syst 49(7):1527–1540
Ye J, Xiong T (2007) SVM versus least squares SVM. In: Artificial intelligence and statistics, pp 644–651
Ying Y, Wen L, Lyu S (2016) Stochastic online AUC maximization. In: Advances in neural information processing systems, pp 451–459
Zhang C, Zhou Y, Guo J, Wang G, Wang X (2018) Research on classification method of high-dimensional class-imbalanced datasets based on SVM. Int J Mach Learn Cybern. https://doi.org/10.1007/s13042-018-0853-2
Zhang K, Kwok JT (2010) Simplifying mixture models through function approximation. IEEE Trans Neural Netw 21(4):644–658
Zhao P, Hoi SC, Jin R, YANG T (2011) Online AUC maximization. In: Proceedings of the 28th international conference on machine learning ICML. International Machine Learning Society
Zhou ZH, Liu XY (2006) Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans Knowl Data Eng 18(1):63–77
Zhu Z, Wang Z, Li D, Du W (2019) Multiple empirical kernel learning with majority projection for imbalanced problems. Appl Soft Comput 76:221–236
Acknowledgements
The work was supported by the Innovation and Technology Commission of the Government of the Hong Kong SAR under the ITF-MRP project (MRP/015/18), the Australian Research Council (ARC) under Discovery Grant DP170101632 and G. Wang is supported by Murdoch New Staff Startup Grant (SEIT NSSG).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Equation (10) can be reformulated as
To derive the dual problem by constructing the Lagrangian, we formulate the Lagrangian J for Eq. (23)
where \({\varvec{\alpha }}_{i}=(\alpha _1,\alpha _2,\ldots ,\alpha _{N})\) is the vector of Lagrangian multipliers. The conditions for optimality are given by
Since \(\varvec{w}^T\left( \varphi (\varvec{x}_k) -\varphi (\varvec{x}_l)\right) \) is scalar, \(\varvec{w}^T \left( \varphi (\varvec{x}_k)-\varphi (\varvec{x}_l)\right) =\left( \varphi (\varvec{x}_k)-\varphi (\varvec{x}_l)\right) ^T\varvec{w}\). We can further write Eq. (25) into
where \(\mathbf{H} =\left[ \varvec{I}+\frac{C}{n^+ n^-}\sum _{k\in N+} \sum _{l\in N-}\left( \varphi (\varvec{x}_k) -\varphi (\varvec{x}_l)\right) \left( \varphi (\varvec{x}_k) -\varphi (\varvec{x}_l)\right) ^T\right] ^{-1}\), \(\varvec{I}\) is the \(N\times N\) identity matrix and \(\left( \varphi {(\varvec{x}_k)} -\varphi (\varvec{x}_l)\right) \left( \varphi {(\varvec{x}_k)}-\varphi (\varvec{x}_l)\right) ^T\) is an \(N\times N\) matrix.
According to Sherman-Morrison-Woodbury formula [33], given an invertible (nonsingular) matrix \(\mathbf{A} \) and column vectors \(\varvec{u}\) and \(\varvec{v}\), assuming \(1+\varvec{v}^{T}{} \mathbf{A} ^{-1} \varvec{u}\ne 0\), we have
In particular if \(\mathbf{A} =\varvec{I}\), we immediately have \((\varvec{I}+\varvec{u}\varvec{v}^T)^{-1}=\varvec{I} -\frac{\varvec{u}\varvec{v}^T}{1+\varvec{v}^T\varvec{u}}\). By applying this formula to H, we can rewrite H into
We notice that the denominator in Eq. (31) is a scalar. If we use M to represent it, Eq. (31) can be simplified into
and accordingly Eq. (26) can be simplified into
By eliminating \(\varvec{w}\) and \(\xi _i\), we can get the following solution
We denote \(k(\varvec{x}_i,\varvec{x}_k)-\frac{1}{M}\sum _{p \in N^+}\sum _{l \in N^-}\left( k(\varvec{x}_i,\varvec{x}_p) -k(\varvec{x}_i,\varvec{x}_l)\right) \left( k(\varvec{x}_p, \varvec{x}_k)-k(\varvec{x}_l,\varvec{x}_k)\right) \) as \(\tilde{k}(\varvec{x}_i,\varvec{x}_k)\), and \(\frac{C}{n^+ n^-} \sum _{k \in N^+}\sum _{l \in N^-}\left\{ \left( k(\varvec{x}_i, \varvec{x}_k)-k(\varvec{x}_i,\varvec{x}_l)\right) -\frac{1}{M}\sum _{p\in N^+} \sum _{q\in N^-} \left( k(\varvec{x}_i, \varvec{x}_p)-k(\varvec{x}_i,\varvec{x}_q)\right) \sum _{k\in N^+}\sum _{l \in N^-}\left( k(\varvec{x}_p,\varvec{x}_k) -k(\varvec{x}_p,\varvec{x}_l) -k(\varvec{x}_q, \varvec{x}_k)+k(\varvec{x}_q,\varvec{x}_l)\right) \right\} \) as \( f(\varvec{x}_i)\), therefore we can rewrite Eq. (34) into
We can further write the above linear equation in the matrix form
where \(\varvec{y}=[y_1;\ldots ;y_N]^T\), \(\varvec{1}=[1;\ldots ;1]\), \(\varvec{f} =[f(\varvec{x}_1);\ldots ;f(\varvec{x}_N)]^T\), and \(\tilde{\mathbf{K }}=(\tilde{k}(\varvec{x}_i,\varvec{x}_k))_{N \times N}\).
Rights and permissions
About this article
Cite this article
Wang, G., Teoh, J.YC., Lu, J. et al. Least squares support vector machines with fast leave-one-out AUC optimization on imbalanced prostate cancer data. Int. J. Mach. Learn. & Cyber. 11, 1909–1922 (2020). https://doi.org/10.1007/s13042-020-01081-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-020-01081-y