Skip to main content

Part of the book series: Massive Computing ((MACO,volume 6))

  • 1142 Accesses

Abstract

Measuring the quality of a prediction rule is a difficult task, which can involve several criteria. The majority of the rule induction literature focuses on discovering accurate, comprehensible rules. In this chapter we also take these two criteria into account, but we go beyond them in the sense that we aim at discovering rules that are interesting (surprising) for the user. Hence, the search for rules is guided by a rule-evaluation function that considers both the degree of predictive accuracy and the degree of interestingness of candidate rules. The search is performed by two versions of a genetic algorithm (GA) specifically designed to the discovery of interesting rules - or “knowledge nuggets.” The algorithm addresses the dependence modeling task (sometimes called “generalized rule induction”), where different rules can predict different goal attributes. This task can be regarded as a generalization of the very well known classification task, where all rules predict the same goal attribute. This chapter also compares the results of the two versions of the GA with the results of a simpler, greedy rule induction algorithm to discover interesting rules.

Triantaphyllou, E. and G. Felici (Eds.), Data Mining and Knowledge Discovery Approaches Based on Rule Induction Techniques, Massive Computing Series, Springer, Heidelberg, Germany, pp. 395–432, 2006.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. T. Back, D.B. Fogel and Z. Michalewicz (Eds) Evolutionary Computation 1: Basic Algorithms and Operators. Institute of Physics Publishing, Bristol. 2000.

    Google Scholar 

  2. T. Blickle. Tournament selection. In: Back T, Fogel DB and Michalewicz T. (Eds) Evolutionary Computation 1: Basic Algorithms and Operators, pp 181–186. Institute of Physics Publishing, Bristol. 2000.

    Google Scholar 

  3. D.R. Carvalho and A.A. Freitas. A genetic algorithm-based solution for the problem of small disjuncts. Principles of Data Mining and Knowledge Discovery (Proc. 4th European Conf., PKDD-2000. Lyon, France). Lecture Notes in Artificial Intelligence 1910, 345–352. Springer-Verlag, 2000.

    Google Scholar 

  4. D.R. Carvalho and A.A. Freitas. A genetic algorithm with sequential niching for discovering small disjunct rules. Proc. Genetic and Evolutionary Computation Conf. (GECCO-2002), pp. 1035–1042. Morgan Kaufmann, 2002.

    Google Scholar 

  5. T.M. Cover and J.A. Thomas. Elements of Information Theory. John Wiley & Sons, 1991.

    Google Scholar 

  6. V. Dhar, D. Chou and F. Provost. Discovering interesting patterns for investment decision making with GLOWER — a genetic learner overlaid with entropy reduction. Data Mining and Knowledge Discovery Journal, 4(4), 251–280. Oct. 2000.

    Article  MATH  Google Scholar 

  7. U.M. Fayyad, G. Piatetsky-Shapiro and P. Smyth. From data mining to knowledge discovery: an overview. In: U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth and R. Uthurusamy (Eds.) Advances in Knowledge Discovery and Data Mining, 1–34. AAAI/MIT Press, 1996.

    Google Scholar 

  8. A.A. Freitas. On objective measures of rule surprisingness. Principles of Data Mining and Knowledge Discovery (Proceedings of the 2nd European Symp., PKDD’98) — Lecture Notes in Artificial Intelligence 1510, 1–9. Springer-Verlag, 1998.

    Google Scholar 

  9. A.A. Freitas. A genetic algorithm for generalized rule induction. In: R. Roy et al. Advances in Soft Computing-Engineering Design and Manufacturing. (Proceedings of the WSC3, 3rd on-line world conf, hosted on the internet, 1998), 340–353. Springer-Verlag, 1999.

    Google Scholar 

  10. A.A. Freitas. Understanding the crucial differences between classification and discovery of association rules-a position paper. ACM SIGKDD Explorations, 2(1), 65–69. ACM, 2000.

    Google Scholar 

  11. A.A. Freitas. Understanding the crucial role of attribute interaction in data mining. Artificial Intelligence Review, 16(3), Nov. 2001, pp. 177–199.

    Article  MATH  Google Scholar 

  12. A.A. Freitas. Data Mining and Knowledge Discovery with Evolutionary Algorithms. Berlin: Springer-Verlag, 2002.

    MATH  Google Scholar 

  13. A.A. Freitas. Evolutionary Computation. In: J. Zytkow and W. Klosgen. (Eds.) Handbook of Data Mining and Knowledge Discovery, pp. 698–706. Oxford: Oxford University Press, 2002.

    Google Scholar 

  14. A.A. Freitas and S.H. Lavington. Mining Very Large Databases with Parallel Processing. Kluwer, 1998.

    Google Scholar 

  15. D.E. Goldberg. Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley, Reading, MA. 1989.

    MATH  Google Scholar 

  16. D.P. Greene, and S.F. Smith. Competition-based induction of decision models from examples. Machine Learning 13, 229–257. 1993.

    Article  Google Scholar 

  17. D.J. Hand. Construction and Assessment of Classification Rules. John Wiley&Sons, 1997.

    Google Scholar 

  18. R.C. Holte, L.E. Acker, and B.W. Porter. Concept Learning and the Problem of Small Disjuncts, Proc. IJCAI — 89, 813–818. 1989.

    Google Scholar 

  19. Z. Michalewicz Z. Genetic Algorithms + Data Structures = Evolution Programs. 3rd Ed. Springer-Verlag, Berlin. 1996.

    Google Scholar 

  20. R.S. Michalski and K.A. Kaufman. Data Mining and Knowledge Discovery: A Review of Issues and Multistrategy Approach. In: Michalski, R.S., Bratko, I. and Kubat, M. (Eds.), Machine Learning and Data Mining: Methods and Applications, pp. 71–112. London: John Wiley & Sons. 1998.

    Google Scholar 

  21. M. Mitchell. An Introduction to Genetic Algorithms. MIT Press, 1996.

    Google Scholar 

  22. E. Noda, A.A. Freitas, and H.S. Lopes. Discovering interesting prediction rules with a genetic algorithm. Proc. of the Congress on Evolutionary Computation (CEC-99), pp. 1322–1329. IEEE Press, 1999

    Google Scholar 

  23. E. Noda, A.A. Freitas and A. Yamakami. A distributed-population genetic algorithm for discovering interesting prediction rules. 7th Online World Conference on Soft Computing (WSC7). Held on the Internet, Sep. 2002.

    Google Scholar 

  24. A. Papagelis and D. Kalles. Breeding decision trees using evolutionary techniques. Proc. 18th Int. Conf. on Machine Learning (ICML-2001), 393–400. San Mateo, CA: Morgan Kaufmann, 2001.

    Google Scholar 

  25. F.J. Provost and J.M. Aronis. Scaling up inductive learning with massive parallelism. Machine Learning 23(1), April 1996, pp. 33–46.

    Google Scholar 

  26. J.R. Quinlan. Generating production rules from decision trees. Proc. of the Tenth Int. Joint Conf. on Artificial Intelligence (IJCAI-87), 304–307. San Francisco: Morgan Kaufmann, 1987.

    Google Scholar 

  27. C. Schaffer. Overfitting avoidance as bias. Machine Learning 10, 1993, 153–178.

    Google Scholar 

  28. P. Smyth and R.M. Goodman. Rule induction by using information theory. In G. Piatetsky-Shapiro and W.J. Frawley (Eds.) Knowledge Discovery in Databases, 159–176. Menlo Park, CA: AAAI Press, 1991.

    Google Scholar 

  29. DJ. Spiegelhalter, D. Michie and C.C. Taylor. Machine Learning, Neural and Statistical Classification. New York: Ellis Horwood, 1994.

    MATH  Google Scholar 

  30. G. Syswerda. Uniform Crossover in genetic Algorithms. Proc. 3rd Int. Conf. on Genetic Algorithms (ICGA-89), 2–9. 1989.

    Google Scholar 

  31. G.M. Weiss. Learning with Rare Cases and Small Disjuncts, Proc. 12th International Conference on Machine Learning (ICML-95), 558–565. 1995.

    Google Scholar 

  32. G.M. Weiss. The Problem with Noise and Small Disjuncts, Proc. Int. Conf. Machine Learning (ICML-98), 1998, 574–578.

    Google Scholar 

  33. G.M. Weiss and H. Hirsh. A Quantitative Study of Small Disjuncts, Proc. of Seventeenth National Conference on Artificial Intelligence. Austin, Texas, 665–670. 2000.

    Google Scholar 

  34. M.L. Wong and K.S. Leung. Data mining using grammar-based genetic programming and applications. Kluwer, 2000.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Noda, E., Freitas, A.A. (2006). Discovering Knowledge Nuggets with a Genetic Algorithm. In: Triantaphyllou, E., Felici, G. (eds) Data Mining and Knowledge Discovery Approaches Based on Rule Induction Techniques. Massive Computing, vol 6. Springer, Boston, MA . https://doi.org/10.1007/0-387-34296-6_12

Download citation

  • DOI: https://doi.org/10.1007/0-387-34296-6_12

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-0-387-34294-8

  • Online ISBN: 978-0-387-34296-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics