Abstract
We aim at improving the distributed implementation of data mining algorithms in modern Internet of Things (IoT) systems. The idea of our approach is performing as much as possible computations at local IoT nodes, rather than transferring data for processing at a central compute cluster as in the current solutions based on MapReduce. We study different kinds of data distributions between the nodes of IoT and we adapt the structure of the implementation correspondingly. Our formally-based approach ensures the correctness of the obtained parallel implementation. We implement our approach in the Java-based data mining library DXelopes, and we illustrate the approach with the popular algorithm Naive Bayes. Experiments confirm that our approach significantly reduces the application run time.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Allen, R., Kennedy, K.: Optimizing Compilers for Modern Architectures. Morgan Kaufmann, Burlington (2002)
Apache Spark. http://spark.apache.org. Accessed 19 June 2019
Atzori, L., Lera, A., Morabito, G.: The Internet of Things: a survey. Comput. Netw. 54(15), 2787–2805 (2010)
Barr, J.: Amazon Machine Learning – Make Data-Driven Decisions at Scale. https://aws.amazon.com/blogs/aws/amazon-machine-learning-make-data-driven-decisions-at-scale. Accessed 19 June 2019
Bernstein, J.: Program analysis for parallel processing. IEEE Trans. Electron. Comput. 15, 757–762 (1966)
Bonomi, F., et al.: Fog computing and its role in the Internet of Things. In: MCC, pp. 13–16 (2012)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: Proceedings of Operating Systems Design and Implementation, San Francisco, CA (2004)
Geetha, J., Pillaipakkamnatt, K., Wright, R.N.: A new privacy-preserving distributed k-clustering algorithm. SDM (2006)
Google Cloud Machine Learning at Scale. https://cloud.google.com/products/machine-learning. Accessed 19 June 2019
Gorlatch, S., Cole, M.: Parallel skeletons. In: Padua, D. (ed.) Encyclopedia of Parallel Computing, pp. 1417–1422. Springer, Boston (2011)
Gronlund, C.J.: Introduction to machine learning on Microsoft Azure. https://azure.microsoft.com/en-gb/documentation/articles/machine-learning-what-is-machine-learning. Accessed 19 June 2019
Gubbi, J., et al.: Internet of Things (IoT): a vision, architectural el-ements, and future directions. Future Gener. Comput. Syst. 29(7), 1645–1660 (2013)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York (2001)
John, G.H., Langley, P.: Estimating continuous distributions in Bayesian classifiers. In: Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 338–345 (1995)
Kaggle. Dataset: Predict Outcome of Pregnancy. https://prudsys.de/en/knowledge/technology/prudsys-xelopes/. Accessed 19 June 2019
Kholod, I., Kuprianov, M., Petukhov, I.: Distributed data mining based on actors for Internet of Things. In: MECO, pp. 480–484 (2016)
Kholod, I., Shorov, A., Titkov, E., Gorlatch, S.: A formally based parallelization of data mining algorithms for multi-core systems. J. Supercomput. (2018). https://doi.org/10.1007/s11227-018-2473-8
Lally, A., et al.: Question analysis: how Watson reads a clue. IBM J. Res. Dev. 56(3.4), 2–11 (2012)
Prudsys Xelopes. https://de.wikipedia.org/wiki/XELOPES. Accessed 19 June 2019
Sunil Kumar, C., Santosh Kumar, P.N., Venugopal, C.: An apriori algorithm in distributed data mining system. Global J. Comput. Sci. Technol. Softw. Data Eng. 13(12) (2013)
Tsai, C.-W., Lai, C.-F., Vasilakos, A.V.: Future Internet of Things: open issues and challenges. Wireless Netw. 20(8), 2201–2217 (2014)
Vaidya, J., Clifton, C.: Privacy-preserving k-means clustering over vertically partitioned data. In: Proceedings of 9th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (2003)
Acknowledgements
This work was supported by the Ministry of Education and Science of the Russian Federation in the framework of the state order “Organization of Scientific Research”, task 2.6113.2017/6.7, by the RFBR according to the research project 19-07-00784., and by the German Ministry of Education and Research (BMBF) in the framework of project HPC2SE at the University of Muenster.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Kholod, I., Shorov, A., Gorlatch, S. (2020). Improving Parallel Data Mining for Different Data Distributions in IoT Systems. In: Kotenko, I., Badica, C., Desnitsky, V., El Baz, D., Ivanovic, M. (eds) Intelligent Distributed Computing XIII. IDC 2019. Studies in Computational Intelligence, vol 868. Springer, Cham. https://doi.org/10.1007/978-3-030-32258-8_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-32258-8_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32257-1
Online ISBN: 978-3-030-32258-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)