Skip to main content

Parallel Data Mining on ATM-Connected PC Cluster and Optimization of its Execution Environments

  • Conference paper
  • First Online:
Parallel and Distributed Processing (IPDPS 2000)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1800))

Included in the following conference series:

Abstract

In this paper, we have constructed a large scale ATM-connected PC cluster consists of 100 PCs, implemented a data mining application, and optimized its execution environment. Default parameters of TCP retransmission mechanism cannot pro vide good performance for data mining application, since a lot of collisions occur in the case of all-to-all multicasting in the large scale PC cluster. Using a TCP retransmission parameters according to the proposed parameter optimization, reasonably good performance improvement is achieved for parallel data mining on 100 PCs.

Association rule mining, one of the best-known problems in data mining, differs from conventional scientific calculations in its usage of main memory. We have investigated the feasibility of using available memory on remote nodes as a swap area when working nodes need to swap out their real memory contents. According to the experimental results on our PC cluster, the proposed method is expected to be considerably better than using hard disks as a swapping device.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. C. Huang and P. K. McKinley: “Communication Issues in Parallel Computing Across ATM Networks”, IEEE Parallel and Distributed Technology, Vol.2, No.4, pp.73–86, 1994.

    Article  Google Scholar 

  2. R. Carter and J. Laroco: “Commodity Clusters: Performance Comparison Between PC’s and Workstations”, Proceedings of the Fifth IEEE International Symposium on High Performance Distributed Computing, pp.292–304, August 1996.

    Google Scholar 

  3. D. E. Culler et al.: “Parallel Computing on the Berkeley NOW”, Proceedings of the 1997 Joint Symposium on Parallel Processing(JSPP’ 97), pp.237–247, May 1997.

    Google Scholar 

  4. T. Tamura, M. Oguchi, and M. Kitsuregawa: “Parallel Database Processing on a 100 Node PC Cluster: Cases for Decision Support Query Processing and Data Mining”, Proceedings of SuperComputing’ 97, November 1997.

    Google Scholar 

  5. U. M. Fayyad et al.: “Advances in Knowledge Discovery and Data Mining”, The MIT Press, 1996.

    Google Scholar 

  6. V. Ganti, J. Gehrke, and R. Ramakrishnan: “Mining Very Large Databases”, IEEE Computer, Vol.32, No.8, pp.38–45, August 1999.

    Article  Google Scholar 

  7. R. Agrawal, T. Imielinski, and A. Swami: “Mining Association Rules between Sets of Items in Large Databases”, Proceedings of the A CM International Conference on Management of Data, pp.207–216, May 1993.

    Google Scholar 

  8. T. Shintani and M. Kitsuregawa: “Hash Based Parallel Algorithms for Mining Association Rules”, Proceedings of the Fourth IEEE International Conference on Parallel and Distributed Information Systems, pp.19–30, December 1996.

    Google Scholar 

  9. M. J. Zaki: “Parallel and Distributed Association Mining: A Survey”, IEEE Concurrency, Vol.7, No.4, pp.14–25, 1999.

    Article  Google Scholar 

  10. C. Amza et al.: “TreadMarks: Shared Memory Computing on Networks of Workstations”, IEEE Computer, Vol.29, No.2, pp.18–28, February 1996.

    Article  Google Scholar 

  11. M. J. Feeley et al.: “Implementing Global Memory Management in a Workstation Cluster”, Proceedings of the ACM Symposium on Operating Systems Principles, pp.201–212, December 1995.

    Google Scholar 

  12. S. Dar et al.: “Semantic Data Caching and Replacement”, Proceedings of 22nd VLDB Conference, September 1996.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Oguchi, M., Kitsuregawa, M. (2000). Parallel Data Mining on ATM-Connected PC Cluster and Optimization of its Execution Environments. In: Rolim, J. (eds) Parallel and Distributed Processing. IPDPS 2000. Lecture Notes in Computer Science, vol 1800. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45591-4_48

Download citation

  • DOI: https://doi.org/10.1007/3-540-45591-4_48

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-67442-9

  • Online ISBN: 978-3-540-45591-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics