Skip to main content

Parallel Data Mining Experimentation Using Flexible Configurations

  • Conference paper
  • First Online:
Rough Sets and Current Trends in Computing (RSCTC 2002)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2475))

Included in the following conference series:

Abstract

When data mining first appeared, several disciplines related to data analysis, like statistics or artificial intelligence were combined toward a new topic: extracting significant patterns from data. The original data sources were small datasets and, therefore, traditional machine learning techniques were the most common tools for this tasks. As the volume of data grows these traditional methods were reviewed and extended with the knowledge from experts working on the field of data management and databases. Today problems are even bigger than before and, once again, a new discipline allows the researchers to scale up to these data. This new discipline is distributed and parallel processing. In order to use parallel processing techniques, specific factors about the mining algorithms and the data should be considered. Nowadays, there are several new parallel algorithms, that in most of the cases are extensions of a traditional centralized algorithm. Many of these algorithms have common core parts and only differ on distribution schema, parallel coordination or load/task balancing methods. We call these groups algorithm families. On this paper we introduce a methodology to implement algorithm families. This methodology is founded on the MOIRAE distributed control architecture. In this work we will show how this architecture allows researchers to design parallel processing components that can change, dynamically, their behavior according to some control policies.

This research project is funded under the Universidad Politécnica de Madrid grant program

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Peter Christen, Ole M. Nielsen, and Markus Hegland. DMtools-open source software for database mining. In PKDD’2001, 2001.

    Google Scholar 

  2. Robert L. Grossman, Stuart M. Bailey, Harinath Sivakumar, and Andrei L. Turinsky. Papyrus: A system for data mining over local and wide-area clusters and super-clusters. In ACM, editor, SC’99. ACM Press and IEEE Computer Society Press, 1999.

    Google Scholar 

  3. Mahesh V. Joshi, Eui-Hong (Sam) Han, George Karypis, and Vipin Kumar. CRPC Parallel Computing Handbook, chapter Parallel Algorithms for Data Mining. Morgan Kaufmann, 2000.

    Google Scholar 

  4. H. Kargupta, B. Park, D. Hershbereger, and E. Johnson. Advanced in Distributed and Parallel Knowledge Discovery, chapter Collective Data Mining: A new perspective towards distributed data mining. AAAI Press / MIT Press, 2000.

    Google Scholar 

  5. Hillol Kargupta, Ilker Hamzaoglu, and Brian Stafford. Scalable, distributed data mining-an agent architecture. page 211.

    Google Scholar 

  6. Kensingston, Enterprise Data Mining. Kensington: New generation enterprise data mining. White Paper, 1999. Parallel Computing Research Centre, Department of Computing Imperial College, (Contact Martin Khler).

    Google Scholar 

  7. S. Krishnaswamy, S. W. Loke, and A. Zaslavsky. Cost models for distributed data mining. Technical Report 2000/59, School of Computer Science and Software Engineering, Monash University, Australia 3168, February 2000.

    Google Scholar 

  8. José M. Peña. Distributed Control Architecture for Data Mining Systems. PhD thesis, DATSI, FI, Universidad Politécnica de Madrid, Spain, June 2001. Spanish title: “Arquitectura Distribuida de Control para Sistemas con Capacidades de Data Mining”.

    Google Scholar 

  9. José M. Peña and Ernestina Menasalvas. Towards flexibility in a distributed data mining framework. In Proceedings of ACM-SIGMOD/PODS 2001, pages 58–61, 2001.

    Google Scholar 

  10. Foster Provost. Advances in Distributed and Parallel Knowledge Discovery, chapter Distributed Data Mining: Scaling Up and Beyond, pages 3–28. AAAI Press/MIT Press, 2000.

    Google Scholar 

  11. O.F. Rana, D.W. Walker, M. Li, S. Lynden, and M. Ward. PaDDMAS: Parallel and distributed data mining application suite. In Proceedings of the Fourteenth International Parallel and Distributed Processing Symposium, 2000.

    Google Scholar 

  12. T. Shintani and M. Kitsuregawa. Parallel algorithms for mining association rule mining on large scale PC cluster. In Mohammed J. Zaki and Ching-Tien Ho, editors, Workshop on Large-Scale Parallel KDD Systems, San Diego, CA, USA, August 1999. ACM. in conjunction with ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD99).

    Google Scholar 

  13. S. Stolfo, W. Fan, W. Lee, A. Prodromidis, and P. Chan. Cost-based modeling for fraud and instrusion detection: Results from the JAM project. In DARPA Information Survivability Conference and Exposition, pages 130–144. IEEE Computer Press, 2000.

    Google Scholar 

  14. M. Zaki. Large-Scale Parallel Data Mining, volume 1759 of Springer Lecture Note in Artificial Intelligence, chapter Parallel and Distributed Data Mining: An Introduction. Springer Verlag, 1999.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Peña, J.M., Javier Crespo, F., Menasalvas, E., Robles, V. (2002). Parallel Data Mining Experimentation Using Flexible Configurations. In: Alpigini, J.J., Peters, J.F., Skowron, A., Zhong, N. (eds) Rough Sets and Current Trends in Computing. RSCTC 2002. Lecture Notes in Computer Science(), vol 2475. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45813-1_58

Download citation

  • DOI: https://doi.org/10.1007/3-540-45813-1_58

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-44274-5

  • Online ISBN: 978-3-540-45813-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics