Skip to main content
Log in

A new clustering algorithm based on connectivity

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

The k-median problem is the theoretical foundation of partitioning-based clustering algorithms. It was first proposed in 1964 and later it was demonstrated that k-median problem is an NP-hard problem in a network. To be exact, the k-median problem under Euclidean distance is NP hard. Fortunately, the k-median problem under connectivity measure is proved to be a deterministic polynomial problem in this study, and the optimal solution is solved. According to the work above, a connectivity-based clustering theory and algorithm is proposed, obtaining the theoretically optimal partition within polynomial time, and an outstanding performance in real data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Algorithm 1
Fig. 7
Algorithm 3
Algorithm 2
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data Availability

Publicly available data sets were analyzed in this study. This data can be found here: http://archive.ics.uci.edu/ml/datasets.php.

References

  1. Zhu X, Gan J, Lu G, Li J, Zhang S (2020) Spectral clustering via half-quadratic optimization. World Wide Web 23(3):1969–1988

    Article  Google Scholar 

  2. Kang Z, Zhao X, Peng C, Zhu H, Zhou JT, Peng X, Chen W, Xu Z (2020) Partition level multiview subspace clustering. Neural Netw 122:279–288

    Article  Google Scholar 

  3. Belhaouari SB, Ahmed S, Mansour S (2014) Optimized k-means algorithm. Math Probl Eng 2014

  4. Ahmed M (2018) Collective anomaly detection techniques for network traffic analysis. Ann Data Sci 5(4):497–512

    Article  Google Scholar 

  5. Ahmed M (2017) An unsupervised approach of knowledge discovery from big data in social network. EAI Endorsed Trans Scalable Inf Syst 4(14):3

    Google Scholar 

  6. Ahmed M (2018) Collective anomaly detection techniques for network traffic analysis. Ann Data Sci 5(4):497–512

    Article  Google Scholar 

  7. Tondini S, Castellan C, Medina MA, Pavesi L (2019) Automatic initialization methods for photonic components on a silicon-based optical switch. Appl Sci 9(9):1843

    Article  Google Scholar 

  8. Zhang X, He Y, Jin Y, Qin H, Azhar M, Huang JZ (2020) A robust k-means clustering algorithm based on observation point mechanism. Complexity 2020

  9. Hale TS, Moberg CR (2003) Location science research: a review. Ann Oper Res 123(1):21–35

    Article  MathSciNet  MATH  Google Scholar 

  10. Hakimi SL (1964) Optimum locations of switching centers and the absolute centers and medians of a graph. Oper Res 12(3):450–459

    Article  MATH  Google Scholar 

  11. Kariv O, Hakimi SL (1979) An algorithmic approach to network location problems. i: the p-centers. SIAM J Appl Math 37(3):513–538

    Article  MathSciNet  MATH  Google Scholar 

  12. Liao H, Hu J, Li T, Du S, Peng B (2022) Deep linear graph attention model for attributed graph clustering. Knowl-Based Syst 21:246

    Google Scholar 

  13. Guo W, Wang W, Zhao S, Niu Y, Zhang Z, Liu X (2022) Density peak clustering with connectivity estimation. Knowl-Based Syst 243:108501

    Article  Google Scholar 

  14. Hadi AS (2022) A new distance between multivariate clusters of varying locations, elliptical shapes, and directions. Pattern Recognition: The Journal of the Pattern Recognition Society 129

  15. Geng X, Tang H (2020) Clustering by connection center evolution. Pattern Recogn 98:107063

    Article  Google Scholar 

  16. Lin G-H, Xue G (1998) K-center and k-median problems in graded distances. Theor Comput Sci 207(1):181–192

    Article  MathSciNet  MATH  Google Scholar 

  17. Hartmanis J (1982) Computers and intractability: a guide to the theory of np-completeness (Michael R. Garey and David S. Johnson). Siam Review 24(1):90

    Article  MathSciNet  Google Scholar 

  18. Rana R, Garg D (2009) Heuristic approaches for k-center problem. In: 2009 IEEE international advance computing conference, IEEE, pp 332–335

  19. Friedler SA, Mount DM (2010) Approximation algorithm for the kinetic robust k-center problem. Comput Geom 43(6–7):572–586

    Article  MathSciNet  MATH  Google Scholar 

  20. Contardo C, Iori M, Kramer R (2019) A scalable exact algorithm for the vertex p-center problem. Comput Oper Res 103:211–220

    Article  MathSciNet  MATH  Google Scholar 

  21. Plesník J (1987) A heuristic for the p-center problems in graphs. Discret Appl Math 17 (3):263–268

    Article  MathSciNet  MATH  Google Scholar 

  22. Shmoys DB (1995) Computing near-optimal solutions to combinatorial optimization problems. Comb Optim 20:355–397

    MathSciNet  MATH  Google Scholar 

  23. Dyer ME, B AMFA (1985) A simple heuristic for the p-centre problem. Oper Res Lett 3 (6):285–288

    Article  MathSciNet  MATH  Google Scholar 

  24. Gonzalez TF (1985) Clustering to minimize the maximum intercluster distance. Theor Comput Sci 38:293–306

    Article  MathSciNet  MATH  Google Scholar 

  25. Hochbaum DS, Shmoys DB (1985) A best possible heuristic for the k-center problem. Math Oper Res 10(2):180–184

    Article  MathSciNet  MATH  Google Scholar 

  26. Charikar M, Li S (2012) A dependent lp-rounding approach for the k-median problem. In: International colloquium on automata, languages, and programming, Springer, pp 194–205

  27. KAUFMAN L (1990) Finding groups in data. An Introduction to Cluster Analysis 230–234

  28. Charikar M, Guha S, Tardos É, Shmoys DB (2002) A constant-factor approximation algorithm for the k-median problem. J Comput Syst Sci 65(1):129–149

    Article  MathSciNet  MATH  Google Scholar 

  29. Jain K, Vazirani VV (2001) Approximation algorithms for metric facility location and k-median problems using the primal-dual schema and lagrangian relaxation. J ACM (JACM) 48(2):274– 296

    Article  MathSciNet  MATH  Google Scholar 

  30. Charikar M, Li S (2012) A dependent lp-rounding approach for the k-median problem. In: International colloquium on automata, languages, and programming, Springer, pp 194–205

  31. Li S, Svensson O (2016) Approximating k-median via pseudo-approximation. SIAM J Comput 45(2):530–547

    Article  MathSciNet  MATH  Google Scholar 

  32. Chrobak M, Kenyon C, Young N (2006) The reverse greedy algorithm for the metric k-median problem. Inf Process Lett 97(2):68–72

    Article  MathSciNet  MATH  Google Scholar 

  33. Meyerson A, O’callaghan L, Plotkin S (2004) A k-median algorithm with running time independent of data size. Mach Learn 56(1):61–87

    Article  MATH  Google Scholar 

  34. Mettu RR, Plaxton CG (2003) The online median problem. SIAM J Comput 32(3):816–832

    Article  MathSciNet  MATH  Google Scholar 

  35. Fotakis D (2006) Incremental algorithms for facility location and k-median. Theor Comput Sci 361(2-3):275–313

    Article  MathSciNet  MATH  Google Scholar 

  36. Vigneron A, Gao L, Golin MJ, Italiano GF, Li B (2000) An algorithm for finding a k-median in a directed tree. Inf Process Lett 74(1-2):81–88

    Article  MathSciNet  MATH  Google Scholar 

  37. Macqueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proc. Fifth berkeley symposium on math. Stat. and prob

  38. Ostrovsky R, Rabani Y (2002) Polynomial-time approximation schemes for geometric min-sum median clustering. J ACM (JACM) 49(2):139–156

    Article  MathSciNet  MATH  Google Scholar 

  39. Kumar A, Sabharwal Y, Sen S (2010) Linear-time approximation schemes for clustering problems in any dimensions. J ACM (JACM) 57(2):1–32

    Article  MathSciNet  MATH  Google Scholar 

  40. Wu Z, Leahy R (1993) An optimal graph theoretic approach to data clustering: theory and its application to image segmentation. IEEE Trans Pattern Anal Mach Intell 15(11):1101–1113

    Article  Google Scholar 

  41. Zhu X, Gan J, Lu G, Li J, Zhang S (2020) Spectral clustering via half-quadratic optimization. World Wide Web 23(3):1969–1988

    Article  Google Scholar 

  42. Von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416

    Article  MathSciNet  Google Scholar 

  43. Wan J, Zhu Q, Lei D, Lu J (2015) Outlier detection based on transitive closure. Intell Data Anal 19(1):145–160

    Article  Google Scholar 

  44. Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. AAAI Press

  45. Bryant A, Cios K (2018) Rnn-dbscan: a density-based clustering algorithm using reverse nearest neighbor density estimates. IEEE Trans Knowl Data Eng 30(6):1109–1121

    Article  Google Scholar 

  46. Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344 (6191):1492

    Article  Google Scholar 

  47. Cheng D, Zhu Q, Huang J, Wu Q, Yang L (2021) Clustering with local density peaks-based minimum spanning tree. IEEE Trans Knowl Data Eng 33(2):374–387

    Article  Google Scholar 

  48. Xu T, Jiang J (2022) A graph adaptive density peaks clustering algorithm for automatic centroid selection and effective aggregation. Expert Syst Appl 195:116539

    Article  Google Scholar 

Download references

Acknowledgements

This research was funded by National Science Foundation of China, No.61976158.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiaqiang Wan.

Ethics declarations

Competing interests

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wan, J., Zhang, K., Guo, Z. et al. A new clustering algorithm based on connectivity. Appl Intell 53, 20272–20292 (2023). https://doi.org/10.1007/s10489-023-04543-2

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-04543-2

Keywords

Navigation