Skip to main content
Log in

PCS-granularity weighted ensemble clustering via Co-association matrix

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Ensemble clustering has attracted much attention for its robustness and effectiveness compared to single clustering. As one of the representative methods, most co-association matrix-based ensemble clustering typically only take into account a single type of information contained in base partitions. This study proposes a new weighted ensemble clustering algorithm of fusing multi-level data information to sufficiently mine the information from the base partition family. Three different levels of data information, including partition granularity level, cluster granularity level and sample granularity level, are concomitantly considered in the co-association matrix. More specifically, we utilize knowledge granularity to measure the quality of base partitions, and rough membership to quantify the credibility of base clusters; Additionally, the relative similarity of a pair of samples is estimated with respect to different base partitions, taking into account the close relationship between samples and the structure of base clusters. Subsequently, the partition-cluster-sample-granularity weighted co-association (PCSCA) matrix is proposed to address the limitations of the co-association matrix, quantifying the quality of information at multiple levels. Finally, this study introduces the partition-cluster-sample-granularity weighted ensemble clustering (PCSEC), which incorporates the PCSCA matrix. The experimental results demonstrate the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data Availability

The datasets analysed during the current study are available from the corresponding author on reasonable request.

Notes

  1. Synthetic datasets derived from the website: https://github.com/milaan9/Clustering-Datasets/tree/master/02.%20Synthetic.

  2. Real datasets derived from the website: http://archive.ics.uci.edu/ml/datasets.php.

References

  1. MacQueen J (1967) Some methods for classification and analysis of multivariate observations. Berkeley Symp Math Stat Probab 1967:281–297

    MathSciNet  Google Scholar 

  2. Arthur D, Vassilvitskii S (2007) K-means++: the advantages of careful seeding. In: The annual ACM-SIAM symposium on discrete algorithms, pp 1027–1035

  3. Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science 315(5814):972–976

    Article  MathSciNet  CAS  PubMed  ADS  Google Scholar 

  4. Zhou ZH (2009) Ensemble learning. Encyclopedia of biometrics, pp 270–273

  5. Ren YZ, Domeniconi C, Zhang GJ, Yu GX (2017) Weighted-object ensemble clustering: methods and analysis. Knowl Inf Syst 51(2):661–689

    Article  Google Scholar 

  6. Tao ZQ, Liu HF, Li J, Wang ZW, Fu Y (2019) Adversarial graph embedding for ensemble clustering. In: International joint conferences on artificial intelligence, pp 3562–3568

  7. Zhou P, Du L, Li XJ (2020) Self-paced consensus clustering with bipartite graph. In: International joint conferences on artificial intelligence, pp 2133–2139

  8. Huang D, Wang CD, Lai JH (2023) Fast multi-view clustering via ensembles: towards scalability, superiority, and simplicity. IEEE Trans Knowl Data Eng 35(11):11388–11402

    Article  Google Scholar 

  9. Zhou J, Zheng HC, Pan LL (2019) Ensemble clustering based on dense representation. Neurocomputing 357:66–76

    Article  Google Scholar 

  10. Bagherinia A, Minaei-Bidgoli B, Hosseinzadeh M, Parvin H (2021) Reliability-based fuzzy clustering ensemble. Fuzzy Sets Syst 413:1–28

    Article  MathSciNet  Google Scholar 

  11. Hu J, Li TR, Wang HJ, Fujita H (2016) Hierarchical cluster ensemble model based on knowledge granulation. Knowl-Based Syst 91:179–188

    Article  Google Scholar 

  12. Fred ALN (2001) Finding consistent clusters in data partitions. Lect Notes Comput Sci 2096:309–318

    Article  MathSciNet  Google Scholar 

  13. Fred ALN, Jain AK (2002) Data clustering using evidence accumulation. In: International conference on pattern recognition, pp 276–280

  14. Fred ALN, Jain AK (2005) Combining multiple clusterings using evidence accumulation. IEEE Trans Pattern Anal Mach Intell 27(6):835–850

    Article  PubMed  Google Scholar 

  15. Jain AK, Murty MN, Flynn PJ (1999) Data clustering. ACM Comput Surv (CSUR) 31(3):264–323

    Article  Google Scholar 

  16. Huang D, Lai JH, Wang CD (2015) Combining multiple clusterings via crowd agreement estimation and multi-granularity link analysis. Neurocomputing 170:240–250

    Article  Google Scholar 

  17. Huang D, Wang CD, Lai JH (2018) Locally weighted ensemble clustering. IEEE Trans Cybern 48(5):1460–1473

    Article  PubMed  Google Scholar 

  18. Gu QH, Wang Y, Wang PP, Li XX, Chen L, Xiong NN, Liu D (2024) An improved weighted ensemble clustering based on two-tier uncertainty measurement. Expert Syst Appl 238(Part A):121672

    Article  Google Scholar 

  19. Niu XY, Zhang CW, Zhao XJ, Hu LH, Zhang JF (2023) A multi-view ensemble clustering approach using joint affinity matrix. Expert Syst Appl 216:119484

    Article  Google Scholar 

  20. Xu L, Ding SF (2021) Dual-granularity weighted ensemble clustering. Knowl-Based Syst 225:107124

    Article  Google Scholar 

  21. Huang D, Lai JH, Wang CD (2016) Robust ensemble clustering using probability trajectories. IEEE Trans Knowl Data Eng 28(5):1312–1326

    Article  Google Scholar 

  22. Li, FJ, Qian YH, Wang JT (2021) GoT: a growing tree model for clustering ensemble. In: the AAAI conference on artificial intelligence, pp 8349–8356

  23. Xu JX, Li TY, Zhang DZ, Wu J (2024) Ensemble clustering via fusing global and local structure information. Expert Syst Appl 237(Part B):121557

    Article  Google Scholar 

  24. Li FJ, Qian YH, Wang JT, Dang CY, Jing LP (2019) Clustering ensemble based on sample’s stability. Artif Intell 273:37–55

    Article  MathSciNet  Google Scholar 

  25. Ji X, Liu SS, Zhao P, Li XJ, Liu Q (2021) Clustering ensemble based on sample’s certainty. Cogn Comput 13:1034–1046

    Article  Google Scholar 

  26. Ji X, Liu SS, Yang L, Ye WL, Zhao P (2022) Clustering ensemble based on approximate accuracy of the equivalence granularity. Appl Soft Comput 129:109492

    Article  Google Scholar 

  27. Lin TY (2003) Granular computing. Rough sets, fuzzy sets, data mining, and granular computing, pp 16–24

  28. Pawlak Z (1982) Rough sets. Int J Comput Inform Sci 11(5):341–356

    Article  Google Scholar 

  29. Chakrabarty K (2001) Roughness indicator fuzzy set. Developments in soft computing, pp 56–61

  30. Miao DQ, Fan SD (2002) The calculation of knowledge granulation and its application. Syst Eng Theory Pract 22:48–59

    Google Scholar 

  31. Liang JY, Wang JH, Qian YH (2009) A new measure of uncertainty based on knowledge granulation for rough sets. Inf Sci 179(4):458–470

    Article  MathSciNet  Google Scholar 

  32. Rendón E, Abundez I, Arizmendi A, Quiroz EM (2011) Internal versus external cluster validation indexes. Int J Comput Commun 5(1):27–34

    Google Scholar 

  33. Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218

    Article  Google Scholar 

  34. Strehl A, Ghosh J (2003) Cluster ensembles – a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617

    MathSciNet  Google Scholar 

  35. Friedman M (1940) A comparison of alternative tests of significance for the problem of \(m\) rankings. Ann Math Stat 11(1):86–92

    Article  MathSciNet  Google Scholar 

  36. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  Google Scholar 

Download references

Funding

The authors would like to thank the editors and anonymous reviewers for their constructive comments. This work is supported by NSFC (No. 12231007), Hunan Provincial Natural Science Foundation of China (No. 2023JJ30113), and Guangdong Basic and Applied Basic Research Foundation (No. 2023A1515012342).

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Mingjie Cai, Feng Xu and Qingguo Li. The first draft of the manuscript was written by Zhishan Wu and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Mingjie Cai or Feng Xu.

Ethics declarations

Ethical and Informed Consent for Data Used

The data used in the current study are ethical.

Competing Interests

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, Z., Cai, M., Xu, F. et al. PCS-granularity weighted ensemble clustering via Co-association matrix. Appl Intell (2024). https://doi.org/10.1007/s10489-024-05368-3

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10489-024-05368-3

Keywords

Navigation