Abstract
Measuring the quality of a prediction rule is a difficult task, which can involve several criteria. The majority of the rule induction literature focuses on discovering accurate, comprehensible rules. In this chapter we also take these two criteria into account, but we go beyond them in the sense that we aim at discovering rules that are interesting (surprising) for the user. Hence, the search for rules is guided by a rule-evaluation function that considers both the degree of predictive accuracy and the degree of interestingness of candidate rules. The search is performed by two versions of a genetic algorithm (GA) specifically designed to the discovery of interesting rules - or “knowledge nuggets.” The algorithm addresses the dependence modeling task (sometimes called “generalized rule induction”), where different rules can predict different goal attributes. This task can be regarded as a generalization of the very well known classification task, where all rules predict the same goal attribute. This chapter also compares the results of the two versions of the GA with the results of a simpler, greedy rule induction algorithm to discover interesting rules.
Triantaphyllou, E. and G. Felici (Eds.), Data Mining and Knowledge Discovery Approaches Based on Rule Induction Techniques, Massive Computing Series, Springer, Heidelberg, Germany, pp. 395–432, 2006.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
T. Back, D.B. Fogel and Z. Michalewicz (Eds) Evolutionary Computation 1: Basic Algorithms and Operators. Institute of Physics Publishing, Bristol. 2000.
T. Blickle. Tournament selection. In: Back T, Fogel DB and Michalewicz T. (Eds) Evolutionary Computation 1: Basic Algorithms and Operators, pp 181–186. Institute of Physics Publishing, Bristol. 2000.
D.R. Carvalho and A.A. Freitas. A genetic algorithm-based solution for the problem of small disjuncts. Principles of Data Mining and Knowledge Discovery (Proc. 4th European Conf., PKDD-2000. Lyon, France). Lecture Notes in Artificial Intelligence 1910, 345–352. Springer-Verlag, 2000.
D.R. Carvalho and A.A. Freitas. A genetic algorithm with sequential niching for discovering small disjunct rules. Proc. Genetic and Evolutionary Computation Conf. (GECCO-2002), pp. 1035–1042. Morgan Kaufmann, 2002.
T.M. Cover and J.A. Thomas. Elements of Information Theory. John Wiley & Sons, 1991.
V. Dhar, D. Chou and F. Provost. Discovering interesting patterns for investment decision making with GLOWER — a genetic learner overlaid with entropy reduction. Data Mining and Knowledge Discovery Journal, 4(4), 251–280. Oct. 2000.
U.M. Fayyad, G. Piatetsky-Shapiro and P. Smyth. From data mining to knowledge discovery: an overview. In: U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth and R. Uthurusamy (Eds.) Advances in Knowledge Discovery and Data Mining, 1–34. AAAI/MIT Press, 1996.
A.A. Freitas. On objective measures of rule surprisingness. Principles of Data Mining and Knowledge Discovery (Proceedings of the 2nd European Symp., PKDD’98) — Lecture Notes in Artificial Intelligence 1510, 1–9. Springer-Verlag, 1998.
A.A. Freitas. A genetic algorithm for generalized rule induction. In: R. Roy et al. Advances in Soft Computing-Engineering Design and Manufacturing. (Proceedings of the WSC3, 3rd on-line world conf, hosted on the internet, 1998), 340–353. Springer-Verlag, 1999.
A.A. Freitas. Understanding the crucial differences between classification and discovery of association rules-a position paper. ACM SIGKDD Explorations, 2(1), 65–69. ACM, 2000.
A.A. Freitas. Understanding the crucial role of attribute interaction in data mining. Artificial Intelligence Review, 16(3), Nov. 2001, pp. 177–199.
A.A. Freitas. Data Mining and Knowledge Discovery with Evolutionary Algorithms. Berlin: Springer-Verlag, 2002.
A.A. Freitas. Evolutionary Computation. In: J. Zytkow and W. Klosgen. (Eds.) Handbook of Data Mining and Knowledge Discovery, pp. 698–706. Oxford: Oxford University Press, 2002.
A.A. Freitas and S.H. Lavington. Mining Very Large Databases with Parallel Processing. Kluwer, 1998.
D.E. Goldberg. Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley, Reading, MA. 1989.
D.P. Greene, and S.F. Smith. Competition-based induction of decision models from examples. Machine Learning 13, 229–257. 1993.
D.J. Hand. Construction and Assessment of Classification Rules. John Wiley&Sons, 1997.
R.C. Holte, L.E. Acker, and B.W. Porter. Concept Learning and the Problem of Small Disjuncts, Proc. IJCAI — 89, 813–818. 1989.
Z. Michalewicz Z. Genetic Algorithms + Data Structures = Evolution Programs. 3rd Ed. Springer-Verlag, Berlin. 1996.
R.S. Michalski and K.A. Kaufman. Data Mining and Knowledge Discovery: A Review of Issues and Multistrategy Approach. In: Michalski, R.S., Bratko, I. and Kubat, M. (Eds.), Machine Learning and Data Mining: Methods and Applications, pp. 71–112. London: John Wiley & Sons. 1998.
M. Mitchell. An Introduction to Genetic Algorithms. MIT Press, 1996.
E. Noda, A.A. Freitas, and H.S. Lopes. Discovering interesting prediction rules with a genetic algorithm. Proc. of the Congress on Evolutionary Computation (CEC-99), pp. 1322–1329. IEEE Press, 1999
E. Noda, A.A. Freitas and A. Yamakami. A distributed-population genetic algorithm for discovering interesting prediction rules. 7th Online World Conference on Soft Computing (WSC7). Held on the Internet, Sep. 2002.
A. Papagelis and D. Kalles. Breeding decision trees using evolutionary techniques. Proc. 18th Int. Conf. on Machine Learning (ICML-2001), 393–400. San Mateo, CA: Morgan Kaufmann, 2001.
F.J. Provost and J.M. Aronis. Scaling up inductive learning with massive parallelism. Machine Learning 23(1), April 1996, pp. 33–46.
J.R. Quinlan. Generating production rules from decision trees. Proc. of the Tenth Int. Joint Conf. on Artificial Intelligence (IJCAI-87), 304–307. San Francisco: Morgan Kaufmann, 1987.
C. Schaffer. Overfitting avoidance as bias. Machine Learning 10, 1993, 153–178.
P. Smyth and R.M. Goodman. Rule induction by using information theory. In G. Piatetsky-Shapiro and W.J. Frawley (Eds.) Knowledge Discovery in Databases, 159–176. Menlo Park, CA: AAAI Press, 1991.
DJ. Spiegelhalter, D. Michie and C.C. Taylor. Machine Learning, Neural and Statistical Classification. New York: Ellis Horwood, 1994.
G. Syswerda. Uniform Crossover in genetic Algorithms. Proc. 3rd Int. Conf. on Genetic Algorithms (ICGA-89), 2–9. 1989.
G.M. Weiss. Learning with Rare Cases and Small Disjuncts, Proc. 12th International Conference on Machine Learning (ICML-95), 558–565. 1995.
G.M. Weiss. The Problem with Noise and Small Disjuncts, Proc. Int. Conf. Machine Learning (ICML-98), 1998, 574–578.
G.M. Weiss and H. Hirsh. A Quantitative Study of Small Disjuncts, Proc. of Seventeenth National Conference on Artificial Intelligence. Austin, Texas, 665–670. 2000.
M.L. Wong and K.S. Leung. Data mining using grammar-based genetic programming and applications. Kluwer, 2000.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Noda, E., Freitas, A.A. (2006). Discovering Knowledge Nuggets with a Genetic Algorithm. In: Triantaphyllou, E., Felici, G. (eds) Data Mining and Knowledge Discovery Approaches Based on Rule Induction Techniques. Massive Computing, vol 6. Springer, Boston, MA . https://doi.org/10.1007/0-387-34296-6_12
Download citation
DOI: https://doi.org/10.1007/0-387-34296-6_12
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-34294-8
Online ISBN: 978-0-387-34296-2
eBook Packages: Computer ScienceComputer Science (R0)