Discovering Knowledge Nuggets with a Genetic Algorithm

Noda, Edgar; Freitas, Alex A.

doi:10.1007/0-387-34296-6_12

Edgar Noda³ &
Alex A. Freitas⁴

Part of the book series: Massive Computing ((MACO,volume 6))

1142 Accesses

Abstract

Measuring the quality of a prediction rule is a difficult task, which can involve several criteria. The majority of the rule induction literature focuses on discovering accurate, comprehensible rules. In this chapter we also take these two criteria into account, but we go beyond them in the sense that we aim at discovering rules that are interesting (surprising) for the user. Hence, the search for rules is guided by a rule-evaluation function that considers both the degree of predictive accuracy and the degree of interestingness of candidate rules. The search is performed by two versions of a genetic algorithm (GA) specifically designed to the discovery of interesting rules - or “knowledge nuggets.” The algorithm addresses the dependence modeling task (sometimes called “generalized rule induction”), where different rules can predict different goal attributes. This task can be regarded as a generalization of the very well known classification task, where all rules predict the same goal attribute. This chapter also compares the results of the two versions of the GA with the results of a simpler, greedy rule induction algorithm to discover interesting rules.

Triantaphyllou, E. and G. Felici (Eds.), Data Mining and Knowledge Discovery Approaches Based on Rule Induction Techniques, Massive Computing Series, Springer, Heidelberg, Germany, pp. 395–432, 2006.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

T. Back, D.B. Fogel and Z. Michalewicz (Eds) Evolutionary Computation 1: Basic Algorithms and Operators. Institute of Physics Publishing, Bristol. 2000.
Google Scholar
T. Blickle. Tournament selection. In: Back T, Fogel DB and Michalewicz T. (Eds) Evolutionary Computation 1: Basic Algorithms and Operators, pp 181–186. Institute of Physics Publishing, Bristol. 2000.
Google Scholar
D.R. Carvalho and A.A. Freitas. A genetic algorithm-based solution for the problem of small disjuncts. Principles of Data Mining and Knowledge Discovery (Proc. 4th European Conf., PKDD-2000. Lyon, France). Lecture Notes in Artificial Intelligence 1910, 345–352. Springer-Verlag, 2000.
Google Scholar
D.R. Carvalho and A.A. Freitas. A genetic algorithm with sequential niching for discovering small disjunct rules. Proc. Genetic and Evolutionary Computation Conf. (GECCO-2002), pp. 1035–1042. Morgan Kaufmann, 2002.
Google Scholar
T.M. Cover and J.A. Thomas. Elements of Information Theory. John Wiley & Sons, 1991.
Google Scholar
V. Dhar, D. Chou and F. Provost. Discovering interesting patterns for investment decision making with GLOWER — a genetic learner overlaid with entropy reduction. Data Mining and Knowledge Discovery Journal, 4(4), 251–280. Oct. 2000.
Article MATH Google Scholar
U.M. Fayyad, G. Piatetsky-Shapiro and P. Smyth. From data mining to knowledge discovery: an overview. In: U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth and R. Uthurusamy (Eds.) Advances in Knowledge Discovery and Data Mining, 1–34. AAAI/MIT Press, 1996.
Google Scholar
A.A. Freitas. On objective measures of rule surprisingness. Principles of Data Mining and Knowledge Discovery (Proceedings of the 2nd European Symp., PKDD’98) — Lecture Notes in Artificial Intelligence 1510, 1–9. Springer-Verlag, 1998.
Google Scholar
A.A. Freitas. A genetic algorithm for generalized rule induction. In: R. Roy et al. Advances in Soft Computing-Engineering Design and Manufacturing. (Proceedings of the WSC3, 3rd on-line world conf, hosted on the internet, 1998), 340–353. Springer-Verlag, 1999.
Google Scholar
A.A. Freitas. Understanding the crucial differences between classification and discovery of association rules-a position paper. ACM SIGKDD Explorations, 2(1), 65–69. ACM, 2000.
Google Scholar
A.A. Freitas. Understanding the crucial role of attribute interaction in data mining. Artificial Intelligence Review, 16(3), Nov. 2001, pp. 177–199.
Article MATH Google Scholar
A.A. Freitas. Data Mining and Knowledge Discovery with Evolutionary Algorithms. Berlin: Springer-Verlag, 2002.
MATH Google Scholar
A.A. Freitas. Evolutionary Computation. In: J. Zytkow and W. Klosgen. (Eds.) Handbook of Data Mining and Knowledge Discovery, pp. 698–706. Oxford: Oxford University Press, 2002.
Google Scholar
A.A. Freitas and S.H. Lavington. Mining Very Large Databases with Parallel Processing. Kluwer, 1998.
Google Scholar
D.E. Goldberg. Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley, Reading, MA. 1989.
MATH Google Scholar
D.P. Greene, and S.F. Smith. Competition-based induction of decision models from examples. Machine Learning 13, 229–257. 1993.
Article Google Scholar
D.J. Hand. Construction and Assessment of Classification Rules. John Wiley&Sons, 1997.
Google Scholar
R.C. Holte, L.E. Acker, and B.W. Porter. Concept Learning and the Problem of Small Disjuncts, Proc. IJCAI — 89, 813–818. 1989.
Google Scholar
Z. Michalewicz Z. Genetic Algorithms + Data Structures = Evolution Programs. 3rd Ed. Springer-Verlag, Berlin. 1996.
Google Scholar
R.S. Michalski and K.A. Kaufman. Data Mining and Knowledge Discovery: A Review of Issues and Multistrategy Approach. In: Michalski, R.S., Bratko, I. and Kubat, M. (Eds.), Machine Learning and Data Mining: Methods and Applications, pp. 71–112. London: John Wiley & Sons. 1998.
Google Scholar
M. Mitchell. An Introduction to Genetic Algorithms. MIT Press, 1996.
Google Scholar
E. Noda, A.A. Freitas, and H.S. Lopes. Discovering interesting prediction rules with a genetic algorithm. Proc. of the Congress on Evolutionary Computation (CEC-99), pp. 1322–1329. IEEE Press, 1999
Google Scholar
E. Noda, A.A. Freitas and A. Yamakami. A distributed-population genetic algorithm for discovering interesting prediction rules. 7th Online World Conference on Soft Computing (WSC7). Held on the Internet, Sep. 2002.
Google Scholar
A. Papagelis and D. Kalles. Breeding decision trees using evolutionary techniques. Proc. 18th Int. Conf. on Machine Learning (ICML-2001), 393–400. San Mateo, CA: Morgan Kaufmann, 2001.
Google Scholar
F.J. Provost and J.M. Aronis. Scaling up inductive learning with massive parallelism. Machine Learning 23(1), April 1996, pp. 33–46.
Google Scholar
J.R. Quinlan. Generating production rules from decision trees. Proc. of the Tenth Int. Joint Conf. on Artificial Intelligence (IJCAI-87), 304–307. San Francisco: Morgan Kaufmann, 1987.
Google Scholar
C. Schaffer. Overfitting avoidance as bias. Machine Learning 10, 1993, 153–178.
Google Scholar
P. Smyth and R.M. Goodman. Rule induction by using information theory. In G. Piatetsky-Shapiro and W.J. Frawley (Eds.) Knowledge Discovery in Databases, 159–176. Menlo Park, CA: AAAI Press, 1991.
Google Scholar
DJ. Spiegelhalter, D. Michie and C.C. Taylor. Machine Learning, Neural and Statistical Classification. New York: Ellis Horwood, 1994.
MATH Google Scholar
G. Syswerda. Uniform Crossover in genetic Algorithms. Proc. 3rd Int. Conf. on Genetic Algorithms (ICGA-89), 2–9. 1989.
Google Scholar
G.M. Weiss. Learning with Rare Cases and Small Disjuncts, Proc. 12th International Conference on Machine Learning (ICML-95), 558–565. 1995.
Google Scholar
G.M. Weiss. The Problem with Noise and Small Disjuncts, Proc. Int. Conf. Machine Learning (ICML-98), 1998, 574–578.
Google Scholar
G.M. Weiss and H. Hirsh. A Quantitative Study of Small Disjuncts, Proc. of Seventeenth National Conference on Artificial Intelligence. Austin, Texas, 665–670. 2000.
Google Scholar
M.L. Wong and K.S. Leung. Data mining using grammar-based genetic programming and applications. Kluwer, 2000.
Google Scholar

Download references

Author information

Authors and Affiliations

School of Electrical & Comp. Eng. (FEEC), State University of Campinas (UNICAMP), Campinas -SP, Brazil
Edgar Noda
Computing Laboratory, University of Kent, Canterbury, Kent, CT2 7NF, UK
Alex A. Freitas

Authors

Edgar Noda
View author publications
You can also search for this author in PubMed Google Scholar
Alex A. Freitas
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Louisiana State University, Baton Rouge, Louisiana, USA
Evangelos Triantaphyllou
Consiglio Nazionale delle Ricerche, Rome, Italy
Giovanni Felici

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Noda, E., Freitas, A.A. (2006). Discovering Knowledge Nuggets with a Genetic Algorithm. In: Triantaphyllou, E., Felici, G. (eds) Data Mining and Knowledge Discovery Approaches Based on Rule Induction Techniques. Massive Computing, vol 6. Springer, Boston, MA . https://doi.org/10.1007/0-387-34296-6_12

Download citation

DOI: https://doi.org/10.1007/0-387-34296-6_12
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-34294-8
Online ISBN: 978-0-387-34296-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics