Abstract
The k-median problem is the theoretical foundation of partitioning-based clustering algorithms. It was first proposed in 1964 and later it was demonstrated that k-median problem is an NP-hard problem in a network. To be exact, the k-median problem under Euclidean distance is NP hard. Fortunately, the k-median problem under connectivity measure is proved to be a deterministic polynomial problem in this study, and the optimal solution is solved. According to the work above, a connectivity-based clustering theory and algorithm is proposed, obtaining the theoretically optimal partition within polynomial time, and an outstanding performance in real data sets.
Similar content being viewed by others
Data Availability
Publicly available data sets were analyzed in this study. This data can be found here: http://archive.ics.uci.edu/ml/datasets.php.
References
Zhu X, Gan J, Lu G, Li J, Zhang S (2020) Spectral clustering via half-quadratic optimization. World Wide Web 23(3):1969–1988
Kang Z, Zhao X, Peng C, Zhu H, Zhou JT, Peng X, Chen W, Xu Z (2020) Partition level multiview subspace clustering. Neural Netw 122:279–288
Belhaouari SB, Ahmed S, Mansour S (2014) Optimized k-means algorithm. Math Probl Eng 2014
Ahmed M (2018) Collective anomaly detection techniques for network traffic analysis. Ann Data Sci 5(4):497–512
Ahmed M (2017) An unsupervised approach of knowledge discovery from big data in social network. EAI Endorsed Trans Scalable Inf Syst 4(14):3
Ahmed M (2018) Collective anomaly detection techniques for network traffic analysis. Ann Data Sci 5(4):497–512
Tondini S, Castellan C, Medina MA, Pavesi L (2019) Automatic initialization methods for photonic components on a silicon-based optical switch. Appl Sci 9(9):1843
Zhang X, He Y, Jin Y, Qin H, Azhar M, Huang JZ (2020) A robust k-means clustering algorithm based on observation point mechanism. Complexity 2020
Hale TS, Moberg CR (2003) Location science research: a review. Ann Oper Res 123(1):21–35
Hakimi SL (1964) Optimum locations of switching centers and the absolute centers and medians of a graph. Oper Res 12(3):450–459
Kariv O, Hakimi SL (1979) An algorithmic approach to network location problems. i: the p-centers. SIAM J Appl Math 37(3):513–538
Liao H, Hu J, Li T, Du S, Peng B (2022) Deep linear graph attention model for attributed graph clustering. Knowl-Based Syst 21:246
Guo W, Wang W, Zhao S, Niu Y, Zhang Z, Liu X (2022) Density peak clustering with connectivity estimation. Knowl-Based Syst 243:108501
Hadi AS (2022) A new distance between multivariate clusters of varying locations, elliptical shapes, and directions. Pattern Recognition: The Journal of the Pattern Recognition Society 129
Geng X, Tang H (2020) Clustering by connection center evolution. Pattern Recogn 98:107063
Lin G-H, Xue G (1998) K-center and k-median problems in graded distances. Theor Comput Sci 207(1):181–192
Hartmanis J (1982) Computers and intractability: a guide to the theory of np-completeness (Michael R. Garey and David S. Johnson). Siam Review 24(1):90
Rana R, Garg D (2009) Heuristic approaches for k-center problem. In: 2009 IEEE international advance computing conference, IEEE, pp 332–335
Friedler SA, Mount DM (2010) Approximation algorithm for the kinetic robust k-center problem. Comput Geom 43(6–7):572–586
Contardo C, Iori M, Kramer R (2019) A scalable exact algorithm for the vertex p-center problem. Comput Oper Res 103:211–220
Plesník J (1987) A heuristic for the p-center problems in graphs. Discret Appl Math 17 (3):263–268
Shmoys DB (1995) Computing near-optimal solutions to combinatorial optimization problems. Comb Optim 20:355–397
Dyer ME, B AMFA (1985) A simple heuristic for the p-centre problem. Oper Res Lett 3 (6):285–288
Gonzalez TF (1985) Clustering to minimize the maximum intercluster distance. Theor Comput Sci 38:293–306
Hochbaum DS, Shmoys DB (1985) A best possible heuristic for the k-center problem. Math Oper Res 10(2):180–184
Charikar M, Li S (2012) A dependent lp-rounding approach for the k-median problem. In: International colloquium on automata, languages, and programming, Springer, pp 194–205
KAUFMAN L (1990) Finding groups in data. An Introduction to Cluster Analysis 230–234
Charikar M, Guha S, Tardos É, Shmoys DB (2002) A constant-factor approximation algorithm for the k-median problem. J Comput Syst Sci 65(1):129–149
Jain K, Vazirani VV (2001) Approximation algorithms for metric facility location and k-median problems using the primal-dual schema and lagrangian relaxation. J ACM (JACM) 48(2):274– 296
Charikar M, Li S (2012) A dependent lp-rounding approach for the k-median problem. In: International colloquium on automata, languages, and programming, Springer, pp 194–205
Li S, Svensson O (2016) Approximating k-median via pseudo-approximation. SIAM J Comput 45(2):530–547
Chrobak M, Kenyon C, Young N (2006) The reverse greedy algorithm for the metric k-median problem. Inf Process Lett 97(2):68–72
Meyerson A, O’callaghan L, Plotkin S (2004) A k-median algorithm with running time independent of data size. Mach Learn 56(1):61–87
Mettu RR, Plaxton CG (2003) The online median problem. SIAM J Comput 32(3):816–832
Fotakis D (2006) Incremental algorithms for facility location and k-median. Theor Comput Sci 361(2-3):275–313
Vigneron A, Gao L, Golin MJ, Italiano GF, Li B (2000) An algorithm for finding a k-median in a directed tree. Inf Process Lett 74(1-2):81–88
Macqueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proc. Fifth berkeley symposium on math. Stat. and prob
Ostrovsky R, Rabani Y (2002) Polynomial-time approximation schemes for geometric min-sum median clustering. J ACM (JACM) 49(2):139–156
Kumar A, Sabharwal Y, Sen S (2010) Linear-time approximation schemes for clustering problems in any dimensions. J ACM (JACM) 57(2):1–32
Wu Z, Leahy R (1993) An optimal graph theoretic approach to data clustering: theory and its application to image segmentation. IEEE Trans Pattern Anal Mach Intell 15(11):1101–1113
Zhu X, Gan J, Lu G, Li J, Zhang S (2020) Spectral clustering via half-quadratic optimization. World Wide Web 23(3):1969–1988
Von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416
Wan J, Zhu Q, Lei D, Lu J (2015) Outlier detection based on transitive closure. Intell Data Anal 19(1):145–160
Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. AAAI Press
Bryant A, Cios K (2018) Rnn-dbscan: a density-based clustering algorithm using reverse nearest neighbor density estimates. IEEE Trans Knowl Data Eng 30(6):1109–1121
Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344 (6191):1492
Cheng D, Zhu Q, Huang J, Wu Q, Yang L (2021) Clustering with local density peaks-based minimum spanning tree. IEEE Trans Knowl Data Eng 33(2):374–387
Xu T, Jiang J (2022) A graph adaptive density peaks clustering algorithm for automatic centroid selection and effective aggregation. Expert Syst Appl 195:116539
Acknowledgements
This research was funded by National Science Foundation of China, No.61976158.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wan, J., Zhang, K., Guo, Z. et al. A new clustering algorithm based on connectivity. Appl Intell 53, 20272–20292 (2023). https://doi.org/10.1007/s10489-023-04543-2
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-04543-2