Skip to main content
Log in

Evaluating the Performance of FedCLUS Algorithm Using FedCI: A New Federated Cluster Validity Metric

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

Federated learning is a recent trend in the field of machine learning for building a collaborative model from distributed data while preserving its privacy. The focus of existing literature is on developing supervised federated learning algorithms requiring labeled data. Whereas only a few solutions have been proposed to identify patterns in distributed unlabeled data using federated clustering methods. However, the issue of measuring the goodness of clusters remains unsolved as existing cluster validity indices cannot be applied in federated learning due to the unavailability of the entire data. To fulfill this research gap, a new metric called FedCI is proposed in the paper for measuring the performance of federated clustering methods, The rationale for FedCI is also discussed and the new metric is validated by comparing it with DB index and Silhouette score. It is found that the behavior of FedCI is consistent with existing metrics. Further, FedCI is applied to the recently proposed FedCLUS a federated clustering method. The FedCLUS algorithm has distinctive characteristics like identification of arbitrarily shaped clusters; the ability to merge, split and discard clusters reported by data owners; communication cost effectiveness. The performance of FedCLUS is compared with centralized DBSCAN using FedCI on various datasets. The results indicate that FedCLUS performs close to the centralized DBSCAN clustering algorithm. The FedCI is expected to guide in finding better clusters in federated settings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Fig. 2
Fig. 3
Fig. 4
Algorithm 2
Fig. 5

Similar content being viewed by others

References

  1. Ajagbe SA, Adegun AA, Olanrewaju AB, Oladosu JB, Adigun MO. Performance investigation of two-stage detection techniques using traffic light detection dataset. IAES Int J Artif Intell (IJ-AI). 2023;12(4):1909–19.

    Google Scholar 

  2. Carbonell JG, Michalski RS, Mitchell TM. An overview of machine learning. In: Michalski RS, Carbonell JG, Mitchell TM, editors. Machine learning. San Francisco, CA: Morgan Kaufmann; 1983. p. 3–23 (ISBN 978-0-08-051054-5).

    Google Scholar 

  3. Sharma S, Bassi I. Efficacy of tsallis entropy in clustering categorical data. In: 2019 IEEE Bombay section signature conference (IBSSC); 2019. p. 1–5. https://doi.org/10.1109/IBSSC47189.2019.8973057.

  4. Sharma S, Pemo S. Performance analysis of various entropy measures in categorical data clustering. In: 2020 International conference on computational performance evaluation (ComPE); 2020. p. 592–595. https://doi.org/10.1109/ComPE49325.2020.9200074.

  5. Galakatos A, Crotty A, Kraska T. Distributed machine learning. New York, NY: Springer; 2018. p. 1196–201 (ISBN 978-1-4614-8265-9).

    Google Scholar 

  6. Rawat R, Oki OA, Sankaran KS, Olasupo O, Ebong GN, Ajagbe SA. A new solution for cyber security in big data using machine learning approach. In: Shakya S, Papakostas G, Kamel KA, editors. Mobile computing and sustainable informatics. Singapore: Springer Nature Singapore; 2023. p. 495–505.

    Chapter  Google Scholar 

  7. Konecny J, McMahan HB, Ramage D, Richtárik P. Federated optimization: distributed machine learning for on-device intelligence. arXiv:abs/1610.02527; 2016.

  8. Yang Q, Fan L, Yu H. Federated learning privacy and incentive. Berlin: Springer; 2020.

    Book  Google Scholar 

  9. Li Q, Wen Z, He B. Practical federated gradient boosting decision trees. In: AAAI conference on artificial intelligence; 2019.

  10. Yamamoto F, Ozawa S, Wang L. efl-boost: efficient federated learning for gradient boosting decision trees. IEEE Access. 2022;10:43954–63. https://doi.org/10.1109/ACCESS.2022.3169502.

    Article  Google Scholar 

  11. Ng I, Zhang K. Towards federated Bayesian network structure learning with continuous optimization. In: 25th International conference on artificial intelligence and statistics (AISTATS), Valencia, Spain; 2022.

  12. Dennis DK, Li T, Smith V. Heterogeneity for the win: one-shot federated clustering. In: Meila M, Zhang T (eds) Proceedings of the 38th international conference on machine learning, ICML 2021, 18–24 July 2021, virtual event, vol. 139 of Proceedings of machine learning research. PMLR; 2021. p. 2611–2620.

  13. Triebe OJ, Rajagopal R. Federated k-means: clustering algorithm and proof of concept, 2022. https://github.com/ourownstory/federated_kmeans/blob/master/federated_kmeans_arxiv.pdf.

  14. Chung J, Lee K, Ramchandran K. Federated unsupervised clustering with generative models, 2022. Paper presented at AAAI.

  15. Saxena D, Cao J. Generative adversarial networks (gans): challenges, solutions, and future directions. ACM Comput Surv. 2021;54(3):1–42.

    Article  Google Scholar 

  16. Gupta S, Sharma S. Fedclus: federated clustering from distributed homogeneous data. In Patel KK, Doctor G, Patel A, Lingras P (eds) 4th international conference on soft computing and its engineering applications, soft computing and its engineering applications. Springer, 2022.

  17. Lu N, Wang Z, Li X, Niu G, Dou Q, Sugiyama M. Federated learning from only unlabeled data with class-conditional-sharing clients. In: International conference on learning representations, 2022. https://openreview.net/forum?id=WHA8009laxu.

  18. Nour B, Cherkaoui S. Unsupervised data splitting scheme for federated edge learning in iot networks. In: ICC 2022—IEEE international conference on communications; 2022. pp. 1–6. https://doi.org/10.1109/ICC45855.2022.9882289.

  19. Ghosh A, Chung J, Yin D, Ramchandran K. An efficient framework for clustered federated learning. IEEE Trans Inf Theory. 2022;68(12):8076–91. https://doi.org/10.1109/TIT.2022.3192506.

    Article  MathSciNet  Google Scholar 

  20. Xie G, Wang J, Huang Y, Li Y, Zheng Y, Zheng F, Jin Y. FedMed-GAN: federated domain translation on unsupervised cross-modality brain image synthesis. Neurocomputing. 2023;46:126282.

    Google Scholar 

  21. Theodoridis S, Koutroumbas K. Chapter 16–cluster validity. In: Theodoridis S, Koutroumbas K, editors. Pattern recognition. 4th ed. Boston: Academic Press; 2009. p. 863–913 (ISBN 978-1-59749-272-0).

    Chapter  Google Scholar 

  22. Davies DL, Bouldin DW. A cluster separation measure. IEEE Trans Pattern Anal Mach Intell. 1979;1(2):224–7. https://doi.org/10.1109/TPAMI.1979.4766909.

    Article  Google Scholar 

  23. Rousseeuw PJ. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math. 1987;20:53–65. https://doi.org/10.1016/0377-0427(87)90125-7.

    Article  Google Scholar 

  24. Ester M, Kriegel H, Sander J, Xu X. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the second international conference on knowledge discovery and data mining, KDD’96. AAAI Press; 1996. p. 226–231.

  25. Henderson DG. Experiencing geometry: on plane and sphere. New York: Cornell University; 1995.

    Google Scholar 

  26. Fränti P, Virmajoki O. Iterative shrinking method for clustering problems. Pattern Recogn. 2006;39(5):761–75.

    Article  Google Scholar 

  27. Karkkainen I, Franti P. Dynamic local search for clustering with unknown number of clusters. In: 2002 International conference on pattern recognition, vol. 2, 2002. p. 240–243. https://doi.org/10.1109/ICPR.2002.1048283.

  28. Rezaei M, Fränti P. Set matching measures for external cluster validity. IEEE Trans Knowl Data Eng. 2016;28(8):2173–86. https://doi.org/10.1109/TKDE.2016.2551240.

    Article  Google Scholar 

  29. Rezaei M, Fränti P. Can the number of clusters be determined by external indices? IEEE Access. 2020;8:89239–57. https://doi.org/10.1109/ACCESS.2020.2993295.

    Article  Google Scholar 

  30. Fränti P, Sieranoja S. K-means properties on six clustering benchmark datasets. Appl Intell. 2018;48:4743–59. https://doi.org/10.1007/s10489-018-1238-7.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shachi Sharma.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Soft Computing in Engineering Applications” guest edited by Kanubhai K. Patel.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sharma, S., Gupta, S. Evaluating the Performance of FedCLUS Algorithm Using FedCI: A New Federated Cluster Validity Metric. SN COMPUT. SCI. 5, 332 (2024). https://doi.org/10.1007/s42979-024-02663-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-024-02663-1

Keywords

Navigation