Abstract
A new privacy-preserving proximal support vector machine (P3SVM) is formulated for classification of vertically partitioned data. Our classifier is based on the concept of global random reduced kernel which is composed of local reduced kernels. Each of them is computed using local reduced matrix with Gaussian perturbation, which is privately generated by only one of the parties, and never made public. This formulation leads to an extremely simple and fast privacy-preserving algorithm, for generating a linear or nonlinear classifier that merely requires the solution of a single system of linear equations. Comprehensive experiments are conducted on multiple publicly available benchmark datasets to evaluate the performance of the proposed algorithms and the results indicate that: (a) Our P3SVM achieves better performance than the recently proposed privacy-preserving SVM via random kernels in terms of both classification accuracy and computational time. (b) A significant improvement of accuracy is attained by our P3SVM when compared to classifiers generated only using each party’s own data. (c) The generated classifier has comparable accuracy to an ordinary PSVM classifier trained on the entire dataset, without releasing any private data.
Similar content being viewed by others
References
Agrawal R (1999) Data mining: crossing the chasm. In: Proceedings of the 5th ACM SIGKDD International Conference on knowledge discovery and data mining, San Diego. doi:10.1145/312129.312167
Agrawal R, Srikant R (2000) Privacy-preserving data mining. ACM SIGMOD Rec 29(2):439–450. doi:10.1145/342009.335438
Agrawal D, Aggarwal CC (2001) On the design and quantification of privacy preserving data mining algorithms. In: Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of database systems. ACM, pp 247–255. doi:10.1145/375551.375602
Vaidya J, Clifton C (2002) Privacy preserving association rule mining in vertically partitioned data. In: Proceedings of the eighth ACM SIGKDD International Conference on Knowledge discovery and data mining. ACM, pp 639–644. doi:10.1145/775047.775142
Bertino E, Lin D, Jiang W (2008) Privacy-preserving data mining, advances in database systems. In: Aggarwal C, Yu P (eds) A survey of quantification of privacy preserving data mining algorithms, 34th edn. Springer, US, pp 183–205. doi:10.1007/978-0-387-70992-5_8
Chen K, Liu L (2005) Privacy preserving data classification with rotation perturbation. In: Proceedings of Fifth IEEE International Conference on Data Mining (ICDM’05), pp 589–592. doi:10.1109/ICDM.2005.121
Xiao MJ, Huang LS, Luo YL, et al. (2005) Privacy preserving ID3 algorithm over horizontally partitioned data. In: Sixth International Conference on parallel and distributed computing applications and technologies (PDCAT’05), pp 239–243. doi:10.1109/PDCAT.2005.191
Verykios VS, Bertino E, Fovino IN et al (2004) State-of-the-art in privacy preserving data mining. ACM SIGMOD Rec 33(1):50–57. doi:10.1145/974121.974131
Vaidya J, Clifton C (2005) Data and applications security XIX. In: Jajodia S, Wijesekera D (eds) Privacy-preserving decision trees over vertically partitioned data, 3654th edn., Lecture notes in computer scienceSpringer, Berlin, pp 139–152. doi:10.1007/11535706_11
Vapnik VN (1998) Statistical learning theory. Wiley, New York
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297. doi:10.1007/bf00994018
Joachims T (1999) Transductive inference for text classification using support vector machines. In: Proceedings of the Sixteenth International Conference on Machine Learning (ICML’99), pp 200–209
Wang X-Z, He Q, Chen D-G, Yeung D (2005) A genetic algorithm for solving the inverse problem of support vector machines. Neurocomputing 68:225–238
Chen W-J, Shao Y-H, Hong N (2013) Laplacian smooth twin support vector machine for semi-supervised classification. Int J Mach Learn Cybern. doi:10.1007/s13042-013-0183-3
Yu H, Jiang X, Vaidya J (2006) Privacy-preserving SVM using nonlinear kernels on horizontally partitioned data. In: Proceedings of the 2006 ACM symposium on Applied computing (SAC’06). ACM, New York, pp 603–610. doi:10.1145/1141277.1141415
Yu H, Vaidya J, Jiang X (2006) Advances in knowledge discovery and data mining. In: Ng W-K, Kitsuregawa M, Li J, Chang K (eds) Privacy-preserving SVM classification on vertically partitioned data, 3918th edn., Lecture notes in computer scienceSpringer, Berlin, pp 647–656. doi:10.1007/11731139_74
Mangasarian OL, Wild EW, Fung GM (2008) Privacy-preserving classification of vertically partitioned data via random kernel. ACM Trans Knowl Discov Data (TKDD) 2(3):1–16. doi:10.1145/1409620.1409622
Mangasarian OL, Wild EW (2008) Privacy-preserving classification of horizontally partitioned data via random kernel. In: Proceedings of the DMIN08, vol 2, pp 473–479
Mangasarian OL, Wild EW (2010) Data mining. In: Stahlbock R, Crone SF, Lessmann S (eds) Privacy-preserving random kernel classification of checkerboard partitioned data, 8th edn., Annals of information systemsSpringer, US, pp 375–387. doi:10.1007/978-1-4419-1280-0_17
Lin KP, Chen MS (2011) On the design and analysis of the privacy-preserving SVM classifier. IEEE Trans Knowl Data Eng 23(11):1704–1717. doi:10.1109/TKDE.2010.193
Lin KP, Chen MS (2010) Privacy-preserving outsourcing support vector machines with random transformation. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge discovery and data mining. ACM, pp 363–372. doi:10.1145/1835804.1835852
Wang X-Z, Lu S-X, Zhai J-H (2008) Fast fuzzy multi-category SVM based on support vector domain description. Int J Pattern Recognit Artif Intell 22(1):109–120
Shao Y-H, Deng N-Y, Chen W-J, Wang Z (2013) Improved generalized eigenvalue proximal support vector machine. IEEE Signal Process Lett 20(3):213–216. doi:10.1109/LSP.2012.2216874
Chen W-J, Shao Y-H, Xu D-K, Fu Y-F (2013) Manifold proximal support vector machine for semi-supervised classification. Appl Intell. doi:10.1007/s10489-013-0491-z
Liu Z, Wu Q, Zhang Y, Chen CP (2011) Adaptive least squares support vector machines filter for hand tremor canceling in microsurgery. Int J Mach Learn Cybern 2(1):37–47. doi:10.1007/s13042-011-0012-5
Fung G, Mangasarian O L (2001) Proximal support vector machine classifiers. In: Proceedings of the seventh ACM SIGKDD International Conference on Knowledge discovery and data mining (KDD’01). ACM, New York, pp 77–86. doi:10.1145/502512.502527
Suykens JAK, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300. doi:10.1023/A:1018628609742
Horn RA, Johnson CR (2012) Matrix analysis. Cambridge university press, Cambridge
Bache K, Lichman M (2013) UCI machine learning repository. University of California, School of Information and Computer Science, Irvine, CA. http://archive.ics.uci.edu/ml
Duda RO, Hart PE, Stork DG (2012) Pattern classification. Wiley, New York
Lee YJ, Mangasarian OL (2001) RSVM: Reduced support vector machines. In: Proceedings of the first SIAM International Conference on data mining. SIAM, Philadelphia, pp 5–7
Lee YJ, Huang SY (2007) Reduced support vector machines: a statistical theory. IEEE Trans Neural Netw 18(1):1–13
Mitchell TM (1997) Machine learning. McGraw-Hill, Boston
Acknowledgments
This work is supported by the China Agricultural Research System (CARS-30).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sun, L., Mu, WS., Qi, B. et al. A new privacy-preserving proximal support vector machine for classification of vertically partitioned data. Int. J. Mach. Learn. & Cyber. 6, 109–118 (2015). https://doi.org/10.1007/s13042-014-0245-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-014-0245-1