Abstract
K-means is one of the most commonly used clustering algorithms, with diverse scope for implementation in the signal processing, artificial intelligence and image processing fields, among others. Different variations and improvements of K-means exist, with kernel K-means being the most famous. K-means has been the subject of many studies aiming to improve its hardware and software implementations. Several of these studies have focused on the parallelization of K-means. Kernelization mainly transforms the data into a feature space of high dimensionality by computing the inner product between each possible data pair. As a result of kernelization, kernel K-means involves several computational steps and has additional computational requirements. As a result, kernel K-means has not seen the same interest and much can still be done in terms of its parallelized and robust implementations. This original research studies and develops different parallel implementations of kernel K-means on both the CPU and the GPU. The proposed CPU implementations use OpenMP and BLAS, while for the developed GPU implementation, the concentration is on CUDA available on Nvidia GPUs. Several datasets of a varying number of features and patterns are used. The results show that CUDA generally provides the best run-times with speedups varying between two to more than two hundred times over a single-core CPU implementation according to the used dataset.
Similar content being viewed by others
References
Aggarwal CC, Reddy CK (2013) Data clustering: algorithms and applications. CRC Press, Boca Raton
Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, Cambridge
Dagum L, Menon R (1998) OpenMP: an industry standard API for shared-memory programming. IEEE Comput Sci Eng 5(1):46–55
Sharafeddin M, Saghir MA, Akkary H, Artail H, Hajj H (2016) On the effectiveness of accelerating MapReduce functions using the Xilinx Vivado HLS tool. Int J High Perform Syst Archit 6(1):1–12
Sharafeddin M, Partamian H, Awad M, Saghir M.A, Akkary H, Artail H, Hajj H, Baydoun M (2016) Towards distributed acceleration of image processing applications using reconfigurable active SSD clusters: a case study of seismic data analysis. Int J High Perform Comput Networking (in press)
Baydoun M, Al-Alaoui MA (2014) Enhancing stereo matching with classification. IEEE Access 2:485–499
Tzortzis GF, Likas AC (2009) The global kernel-means algorithm for clustering in feature space. IEEE Trans Neural Netw 20(7):1181–1194
Tzortzis G, Likas A (2008) The global kernel K-means clustering algorithm. In: Neural Networks, 2008. IJCNN 2008 IEEE World Congress on Computational Intelligence. IEEE International Joint Conference on IEEE, pp 1977–1984
Arthur D, Vassilvitskii S (2007) K-means: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete algorithms Society for Industrial and Applied Mathematics, pp 1027–1035
Zhang R, Rudnicky AI (2002) A large scale clustering scheme for kernel K-means. In: 16th International Conference on Pattern Recognition, 2002. Proceedings, vol 4. IEEE, pp 289–292
Tsapanos N, Tefas A, Nikolaidis N et al (2015) A distributed framework for trimmed kernel K-means clustering. Pattern Recognit 48(8):2685–2698
Tsapanos N, Tefas A, Nikolaidis N et al (2015) Distributed, MapReduce-based nearest neighbor and E-ball kernel K-means. In: Computational Intelligence, 2015 IEEE Symposium Series on IEEE, pp 509–515
Baydoun M, Dawi M, Ghaziri H (2016) Parallel kernel K-means on the CPU and the GPU. In: Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA) the Steering Committee of the World Congress in Computer Science, Computer Engineering and Applied Computing (WorldComp), p 117
Che S, Boyer M, Meng J et al (2008) A performance study of general-purpose applications on graphics processors using CUDA. J Parallel Distrib Comput 68(10):1370–1380
Farivar R, Rebolledo D, Chan E et al (2008) A parallel implementation of K-means clustering on GPUs. PDPTA 13:212–312
Zechner M, Granitzer M (2009) Accelerating K-means on the graphics processor via cuda. In: Intensive Applications and Services, 2009. INTENSIVE’09. First International Conference on IEEE, pp 7–15
Baydoun M, Dawi M, Ghaziri H (2016) Enhanced parallel implementation of the K-means clustering algorithm. In: Advances in Computational Tools for Engineering Applications (ACTEA), 2016 3rd International Conference on IEEE, pp 7–11
Zhao W, Ma H, He Q (2009) Parallel K-Means clustering based on mapreduce. In: IEEE International Conference on Cloud Computing. Springer, Berlin, Heidelberg, pp 674–679
Bhimani J, Leeser M, Mi N (2015) Accelerating K-means clustering with parallel implementations and GPU computing. In: High Performance Extreme Computing Conference (HPEC), 2015 IEEE IEEE, pp 1–6
Rodrigues LM, Zárate LE, Nobre CN et al (2012) Parallel and distributed kmeans to identify the translation initiation site of proteins. In: Systems, Man, and Cybernetics (SMC), 2012 IEEE International Conference on IEEE, pp 1639–1645
Stoffel K, Belkoniene A (1999) Parallel k/h-means clustering for large data sets. In: Euro-Par’99 Parallel Processing. Springer, pp 1451–1454
Lau KW, Yin H, Hubbard S (2006) Kernel self-organising maps for classification. Neurocomputing 69(16):2033–2040
http://www.openblas.net/. Accessed 20 Oct 2017
Michie D, Spiegelhalter DJ, Taylor CC (1994) Machine learning, neural and statistical classification. Ellis Horwood, Amsterdam
Blake C, Merz CJ (1998) UCI Repository of machine learning databases. University of California, Irvine. https://archive.ics.uci.edu/ml/index.php. Accessed 20 Oct 2017
LeCun Y, Cortes C, Burges CJ (1998) The MNIST database of handwritten digits. https://yann.lecun.com/exdb/mnist/. Accessed 20 Oct 2017
http://www.pgroup.com/. Accessed 20 Oct 2017
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Baydoun, M., Ghaziri, H. & Al-Husseini, M. CPU and GPU parallelized kernel K-means. J Supercomput 74, 3975–3998 (2018). https://doi.org/10.1007/s11227-018-2405-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-018-2405-7