Parallel Data Mining on ATM-Connected PC Cluster and Optimization of its Execution Environments

Oguchi, Masato; Kitsuregawa, Masaru

doi:10.1007/3-540-45591-4_48

Masato Oguchi^2,3 &
Masaru Kitsuregawa²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1800))

Included in the following conference series:

International Parallel and Distributed Processing Symposium

865 Accesses
2 Citations

Abstract

In this paper, we have constructed a large scale ATM-connected PC cluster consists of 100 PCs, implemented a data mining application, and optimized its execution environment. Default parameters of TCP retransmission mechanism cannot pro vide good performance for data mining application, since a lot of collisions occur in the case of all-to-all multicasting in the large scale PC cluster. Using a TCP retransmission parameters according to the proposed parameter optimization, reasonably good performance improvement is achieved for parallel data mining on 100 PCs.

Association rule mining, one of the best-known problems in data mining, differs from conventional scientific calculations in its usage of main memory. We have investigated the feasibility of using available memory on remote nodes as a swap area when working nodes need to swap out their real memory contents. According to the experimental results on our PC cluster, the proposed method is expected to be considerably better than using hard disks as a swapping device.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

C. Huang and P. K. McKinley: “Communication Issues in Parallel Computing Across ATM Networks”, IEEE Parallel and Distributed Technology, Vol.2, No.4, pp.73–86, 1994.
Article Google Scholar
R. Carter and J. Laroco: “Commodity Clusters: Performance Comparison Between PC’s and Workstations”, Proceedings of the Fifth IEEE International Symposium on High Performance Distributed Computing, pp.292–304, August 1996.
Google Scholar
D. E. Culler et al.: “Parallel Computing on the Berkeley NOW”, Proceedings of the 1997 Joint Symposium on Parallel Processing(JSPP’ 97), pp.237–247, May 1997.
Google Scholar
T. Tamura, M. Oguchi, and M. Kitsuregawa: “Parallel Database Processing on a 100 Node PC Cluster: Cases for Decision Support Query Processing and Data Mining”, Proceedings of SuperComputing’ 97, November 1997.
Google Scholar
U. M. Fayyad et al.: “Advances in Knowledge Discovery and Data Mining”, The MIT Press, 1996.
Google Scholar
V. Ganti, J. Gehrke, and R. Ramakrishnan: “Mining Very Large Databases”, IEEE Computer, Vol.32, No.8, pp.38–45, August 1999.
Article Google Scholar
R. Agrawal, T. Imielinski, and A. Swami: “Mining Association Rules between Sets of Items in Large Databases”, Proceedings of the A CM International Conference on Management of Data, pp.207–216, May 1993.
Google Scholar
T. Shintani and M. Kitsuregawa: “Hash Based Parallel Algorithms for Mining Association Rules”, Proceedings of the Fourth IEEE International Conference on Parallel and Distributed Information Systems, pp.19–30, December 1996.
Google Scholar
M. J. Zaki: “Parallel and Distributed Association Mining: A Survey”, IEEE Concurrency, Vol.7, No.4, pp.14–25, 1999.
Article Google Scholar
C. Amza et al.: “TreadMarks: Shared Memory Computing on Networks of Workstations”, IEEE Computer, Vol.29, No.2, pp.18–28, February 1996.
Article Google Scholar
M. J. Feeley et al.: “Implementing Global Memory Management in a Workstation Cluster”, Proceedings of the ACM Symposium on Operating Systems Principles, pp.201–212, December 1995.
Google Scholar
S. Dar et al.: “Semantic Data Caching and Replacement”, Proceedings of 22nd VLDB Conference, September 1996.
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Industrial Science, The University of Tokyo, 7-22-1 Roppongi, Minato-ku, Tokyo, 106-8558, Japan
Masato Oguchi & Masaru Kitsuregawa
Informatik4, Aachen University of Technology, Ahornstr.55, D-52056, Aachen, Germany
Masato Oguchi

Authors

Masato Oguchi
View author publications
You can also search for this author in PubMed Google Scholar
Masaru Kitsuregawa
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Centre Universitaire d’Informatique, Université de Genève, 24, rue Général Dufour, CH-1211, Genève 4, Switzerland
José Rolim

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Oguchi, M., Kitsuregawa, M. (2000). Parallel Data Mining on ATM-Connected PC Cluster and Optimization of its Execution Environments. In: Rolim, J. (eds) Parallel and Distributed Processing. IPDPS 2000. Lecture Notes in Computer Science, vol 1800. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45591-4_48

Download citation

DOI: https://doi.org/10.1007/3-540-45591-4_48
Published: 25 May 2000
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67442-9
Online ISBN: 978-3-540-45591-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics