Abstract
This paper describes a formally based approach for parallelizing the Kohonen algorithm used for the federated learning process in a special kind of neural networks—Self-Organizing Maps. Our approach enables executing the parallel algorithm version on the distributed data sources, taking into account the kind of data distribution on the nodes. Compared to the traditional approaches, we distinguish two kinds of data distributions—horizontal and vertical: for both, our suggested approach avoids gathering data in a single storage, but rather moves computations nearer to the data source nodes. This reduces the execution time of the algorithm, the network traffic, and the risk of an unauthorized access to the data during their transmission. Our experimental evaluation demonstrates the advantages of the approach.
Similar content being viewed by others
References
Dehghani Z (2019) How to move beyond a monolithic data lake to a distributed data mesh. https://martinfowler.com/articles/data-monolith-to-mesh.html
Voigt P, Von dem Bussche A (2017) The EU general data protection regulation (GDPR). In: A practical guide, 1st ed. Springer International Publishing, Cham
California Consumer Privacy Act Home Page. https://www.caprivacy.org/
Konecný J, Brendan McMahan H, Ramage D, Richtárik P (2016) Federated optimization: distributed machine learning for on-device intelligence. arXiv:CoRRabs/1610.02527(2016)
Yang Q, Liu Y, Chen T, Tong Y (2019) Federated machine learning: concept and applications. ACM Trans Intell Syst Technol 10(2):12
Kohonen T (2001) Self-organizing maps (Third Extended Edition), New York
Kholod I, Shorov A, Efimova M, Gorlatch S (2019) Parallelization of algorithms for mining data from distributed sources. PaCT-2019. Springer. LNCS, pp 289–303 https://doi.org/10.1007/978-3-030-25636-4_23
Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning: data mining, inference, and prediction. Springer
Dean J, Ghemawat S (2004) MapReduce: simplified data processing on large clusters. In Proceedings of Operating Systems Design and Implementation. San Francisco, CA
Gorlatch S, Cole M (2011) Parallel Skeletons. In: Padua D (ed.) Encyclopedia of parallel computing. Springer
Lawrence RD, Almasi GS, Rushmeier HE (1999) A scalable parallel algorithm for selfor-ganizing maps with applications to sparse data mining problems. Data Min Knowl Disc 3(2):171–195
Fort J, Letrémy P, Cottrell M (2002) Advantages and drawbacks of the Batch Kohonen algo-rithm. ESANN
Weichel Ch (2010) Adapting self-organizing maps to the mapreduce programming paradigm. STeP, pp 119–131. https://doi.org/10.1524/9783486853162.119
Sarazin T, Azzag H, Lebbah M (2014) SOM Clustering using spark-mapreduce. In: 2014 IEEE 28th International Parallel & Distributed Processing Symposium Workshops, pp 1727–1734 https://doi.org/10.1109/IPDPSW.2014.192
Dafonte C, Garabato D, Álvarez MA, Manteiga M (2018) Distributed fast self-organized maps for massive spectrophotometric data analysis. Sensors (Basel) 18(5):1419. Published 2018 May 3. https://doi.org/10.3390/s18051419
Flavius LG, Jose Alfredo FC (2008) Parallel self-organizing maps with application in clustering distributed data. Neural Networks. IJCNN 2008. IEEE International Joint Conference on IEEE World Congress on Computational Intelligence
Li Q, et al (2020) Federated learning systems: vision, hype and reality for data privacy and protection. arXiv:abs/1907.09693
Ingerman A, Ostrowski K (2019) Introducing TensorFlow Federated https://blog.tensorflow.org/2019/03/introducing-tensorflow-federated.html
Ryffel Th, Trask A, Dahl M, Wagner B, Mancuso J, Rueckert D, Passerat-Palmbach J (2018) A generic framework for privacy preserving deep learning. preprint arXiv:1811.04017
An Industrial Grade Federated Learning Framework https://fate.fedai.org/
Paddle Federated Learning https://github.com/PaddlePaddle/PaddleFL
Kholod I, Kuprianov M, Titkov E, Shorov A, Postnikova E, Mironenko I, Sokolov S (2019) Training normal Bayes classifier on distributed data. Proc Comput Sci 150:389–396. https://doi.org/10.1016/j.procs.2019.02.068
Kholod I, Rukavitsyn A, Reva N, Shorov A (2019) Distributed data clustering by neural network algorithms. In: Proceedings of the 2019 IEEE Russia Section Young Researchers in Electrical and Electronic Engineering Conference—IEEE. pp 249–253. https://doi.org/10.1109/EIConRus.2019.8657175
Acknowledgements
We are grateful to the anonymous reviewers whose very helpful comments allowed us to significantly improve. This work was supported by the German Ministry of Education and Research (BMBF) in the framework of project HPC2SE at the University of Muenster.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Kholod, I., Rukavitsyn, A., Paznikov, A. et al. Parallelization of the self-organized maps algorithm for federated learning on distributed sources. J Supercomput 77, 6197–6213 (2021). https://doi.org/10.1007/s11227-020-03509-2
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-020-03509-2