Abstract
Distributed computing and data mining are nowadays almost ubiquitous. Authors propose methodology of distributed data mining by combining local analytical models (built in parallel in nodes of a distributed computer system) into a global one without necessity to construct distributed version of data mining algorithm. Different combining strategies for clustering and classification are proposed and their verification methods as well. Proposed solutions were tested with data sets coming from UCI Machine Learning Repository.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Chan, P., Prodromidis, A., Stolfo, G.: Meta-learning in distributed data mining systems: Issues and approaches. Advances of Distributed Data Mining. AAAI Press, Menlo Park (2000)
Guo, Y., Reuger, S.M., Sutiwaraphun, J., Forbes-Millot, J.: Meta-learning for parallel data mining. In: Proceedings of the 7th Parallel Computing Workshop (1997)
Caragea, D., Silvescu, A., Honavar, V.: Invited Paper. a Framework for Learning from Distributed Data Using Sufficient Statistics and its Application to Learning Decision Trees. International Journal of Hybrid Intelligent Systems 1(2), 80–89 (2004)
Gorawski, M., Pluciennik, E.: Distributed Data Mining Methodology with Classification Model Example. In: 1st International Conference on Computational Collective Intelligence - Semantic Web, Social Networks & Multiagent Systems, ICCCI, Wrocaw, Poland, October 5-7 (2009)
Grossman, R., Turinsky, A.: A Framework for Finding Distributed Data Mining Strategies That Are Intermediate Between Centralized Strategies and In-Place Strategies. In: Proceedings of Workshop on Distributed and Parallel Knowledge Discovery at KDD 2000, pp. 1–7 (2000)
Theodorakis, M., Vlachos, A., Kalamboukis, T.Z.: Using Hierarchical Clustering to Enhance Classification Accuracy. In: Proceedings of 3rd Hellenic Conference in Artificial Intelligence, Samos (2004)
Topchy, A., Jain, A.K., Punch, W.: Clustering ensembles: models of consensus and weak partitions. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(12), 1866–1881 (2005); Digital Object Identifier: 10.1109/TPAMI.2005.237
Petrakis, Y., Koloniari, G., Pitoura, E.: On Using Histograms as Routing Indexes in Peer-to-Peer System. In: Ng, W.S., Ooi, B.-C., Ouksel, A.M., Sartori, C. (eds.) DBISP2P 2004. LNCS, vol. 3367, pp. 16–30. Springer, Heidelberg (2005)
Gorawski, M., Pluciennik, E.: Analytical Models Combining Methodology with Classification Model Example. In: First International Conference on Information Technology, Gdansk (2008), ISBN:978-1-4244-2244-9, http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4621623 ,
Asuncion, A., Newman, D.J.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine (2007), http://www.ics.uci.edu/~mlearn/MLRepository.html
Quinlan, R.: Induction of Decision Trees. Machine Learning 1, 81–106 (1986)
Fisher, D.H.: Knowledge acquisition via incremental conceptual clustering. Journal Machine Learning 2(2), 139–172 (1987)
Milenova, B.L., Campos, M.M.: O-Cluster: Scalable Clustering of Large High Dimensional Data Sets. In: Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM 2002), p. 290 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gorawski, M., Pluciennik-Psota, E. (2010). Distributed Data Mining Methodology for Clustering and Classification Model. In: Rutkowski, L., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing. ICAISC 2010. Lecture Notes in Computer Science(), vol 6113. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13208-7_41
Download citation
DOI: https://doi.org/10.1007/978-3-642-13208-7_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13207-0
Online ISBN: 978-3-642-13208-7
eBook Packages: Computer ScienceComputer Science (R0)