Lazy Learning of Bayesian Rules

Zheng, Zijian; Webb, Geoffrey I.

doi:10.1023/A:1007613203719

Lazy Learning of Bayesian Rules

Published: October 2000

Volume 41, pages 53–84, (2000)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Lazy Learning of Bayesian Rules

Download PDF

Zijian Zheng¹ &
Geoffrey I. Webb²

2368 Accesses
161 Citations
Explore all metrics

Abstract

The naive Bayesian classifier provides a simple and effective approach to classifier learning, but its attribute independence assumption is often violated in the real world. A number of approaches have sought to alleviate this problem. A Bayesian tree learning algorithm builds a decision tree, and generates a local naive Bayesian classifier at each leaf. The tests leading to a leaf can alleviate attribute inter-dependencies for the local naive Bayesian classifier. However, Bayesian tree learning still suffers from the small disjunct problem of tree learning. While inferred Bayesian trees demonstrate low average prediction error rates, there is reason to believe that error rates will be higher for those leaves with few training examples. This paper proposes the application of lazy learning techniques to Bayesian tree induction and presents the resulting lazy Bayesian rule learning algorithm, called LBR. This algorithm can be justified by a variant of Bayes theorem which supports a weaker conditional attribute independence assumption than is required by naive Bayes. For each test example, it builds a most appropriate rule with a local naive Bayesian classifier as its consequent. It is demonstrated that the computational requirements of LBR are reasonable in a wide cross-section of natural domains. Experiments with these domains show that, on average, this new algorithm obtains lower error rates significantly more often than the reverse in comparison to a naive Bayesian classifier, C4.5, a Bayesian tree learning algorithm, a constructive Bayesian classifier that eliminates attributes and constructs new attributes using Cartesian products of existing nominal attributes, and a lazy decision tree learning algorithm. It also outperforms, although the result is not statistically significant, a selective naive Bayesian classifier.

References

Aha, D.W., Kibler, D., & Albert, M. K. (1991). Instance-based learning algorithms. Machine Learning, 6, 37–66.
Google Scholar
Aha, D. W. (Ed.) (1997). Lazy learning. Dordrecht: Kluwer Academic.
Google Scholar
Blake, C., Keogh, E.,& Merz, C. J. (1998). UCI repository of machine learning databases [http://www.ics.uci.edu/Qmlearn/MLRepository.html]. Department of Information and Computer Science, University of California, Irvine, CA.
Google Scholar
Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees. Belmont, CA: Wadsworth.
Google Scholar
Briand, L. C. & Thomas, W. M. (1992). A pattern recognition approach for software engineering data analysis. IEEE Transactions on Software Engineering, 18, 931–942.
Google Scholar
Cestnik, B. (1990). Estimating probabilities: A crucial task in machine learning. In Proceedings of the European Conference on Artificial Intelligence (pp. 147–149).
Cestnik, B., Kononenko, I., & Bratko, I. (1987). ASSISTANT 86: A knowledge-elicitation tool for sophisticated users. In I. Bratko & N. Lavrač (Eds.), Progress in machine learning—Proceedings of the second european working session on learning (EWSL87) (pp. 31–45). Wilmslow, UK: Sigma Press.
Google Scholar
Chatfield, C. (1978). Statistics for technology: A course in applied statistics. London: Chapman and Hall.
Google Scholar
Chow, C. K. & Liu, C. N. (1968). Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory, 14, 462–467.
Google Scholar
Cover, T. M. & Hart, P. E. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13, 21–27.
Google Scholar
Dasarathy, B. V. (1980). Noising around the neighborhood: A new system structure and classification rule for recognition in partially exposed environments. Pattern Analysis and Machine Intelligence, 2, 67–71.
Google Scholar
Domingos, P. & Pazzani, M. (1996). Beyond independence: Conditions for the optimality of the simple Bayesian classifier. In Proceedings of the Thirteenth International Conference on Machine Learning (pp. 105–112). San Francisco, CA: Morgan Kaufmann.
Google Scholar
Duda, R. O. & Hart, P. E. (1973). Pattern classification and scene analysis. New York: John Wiley.
Google Scholar
Fayyad, U. M. & Irani, K. B. (1993). Multi-interval discretization of continuous-valued attributes for classification learning. In Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence (pp. 1022–1027). San Mateo, CA: Morgan Kaufmann.
Google Scholar
Friedman, N. & Goldszmidt, M. (1996). Building classifiers using Bayesian networks. In Proceedings of the Thirteenth National Conference on Artificial Intelligence (pp. 1277–1284). Menlo Park, CA: The AAAI Press.
Google Scholar
Friedman, J., Kohavi, R., & Yun, Y. (1996). Lazy decision trees. In Proceedings of the Thirteenth National Conference on Artificial Intelligence (pp. 717–724). Menlo Park, CA: The AAAI Press.
Google Scholar
Fulton, T., Kasif, S., Salzberg, S., & Waltz, D. (1996). Local induction of decision trees: Towards interactive data mining. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (pp. 14–19). Menlo Park, CA: AAAI Press.
Google Scholar
Gates, G.W. (1972). The reduced nearest neighbor rule. IEEE Transactions on Information Theory, 18, 431–433.
Google Scholar
Geiger, D. (1992). An entropy-based learning algorithm of Bayesian conditional trees. In Proceedings of the Eighth Conference on Uncertainty in Artificial Intelligence (pp. 92–97). Morgan Kaufmann.
Hart, P. E. (1968). The condensed nearest neighbor rule. IEEE Transactions on Information Theory, 14, 515–516.
Google Scholar
Holte, R. C., Acker, L. E., & Porter, B. W. (1989). Concept learning and the problem of small disjuncts. In Proceedings of the Eleventh International Joint Conference on Artificial Intelligence (pp. 813–818). San Mateo, CA: Morgan Kaufmann.
Google Scholar
John, G. H., Kohavi, R., & Pfleger, K. (1994). Irrelevant features and the subset selection problem. In Proceedings of the Eleventh International Conference on Machine Learning (pp. 121–129). San Francisco, CA: Morgan Kaufmann.
Google Scholar
Kittler, J. (1986). Feature selection and extraction. In T. Y. Young & K. Fu (Eds.), Handbook of pattern recognition and image processing (pp. 59–81). San Diego, CA: Academic Press.
Google Scholar
Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence (pp. 1137–1143). San Mateo, CA: Morgan Kaufmann.
Google Scholar
Kohavi, R. (1996). Scaling up the accuracy of naive-Bayes classifiers: A decision-tree hybrid. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (pp. 202–207). Menlo Park, CA: The AAAI Press.
Google Scholar
Kononenko, I. (1990). Comparison of inductive and naive Bayesian learning approaches to automatic knowledge acquisition. In B. Wielinga, J. Boose, B. Gaines, G. Schreiber, M. van Someren (Eds.), Current trends in knowledge acquisition. Amsterdam: IOS Press.
Google Scholar
Kononenko, I. (1991). Semi-naive Bayesian classifier. In Proceedings of European Conference on Artificial Intelligence (pp. 206–219).
Kononenko, I. (1993). Inductive and Bayesian learning in medical diagnosis. Applied Artificial Intelligence, 7, 317–337.
Google Scholar
Kubat, M., Flotzinger, D., & Pfurtscheller, G. (1993). Discovering patterns in EEG-signals: Comparative study of a few methods. In Proceedings of European Conference on Machine Learning (pp. 366–371). Berlin: Springer-Verlag.
Google Scholar
Langley, P. (1993). Induction of recursive Bayesian classifiers. In Proceedings of the European Conference on Machine Learning (pp. 153–164). Berlin: Springer-Verlag.
Google Scholar
Langley, P., Iba, W. F., & Thompson, K. (1992). An analysis of Bayesian classifiers. In Proceedings of the Tenth National Conference on Artificial Intelligence (pp. 223–228). Menlo Park, CA: The AAAI Press.
Google Scholar
Langley, P. & Sage, S. (1994). Induction of selective Bayesian classifiers. In Proceedings of the Tenth Conference on Uncertainty in Artificial Intelligence (pp. 339–406). Seattle, WA: Morgan Kaufmann.
Google Scholar
Pazzani, M. J. (1996). Constructive induction of Cartesian product attributes. In Proceedings of the Conference, ISIS'96: Information, Statistics and Induction in Science (pp. 66–77). Singapore: World Scientific.
Google Scholar
Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. San Mateo, CA: Morgan Kaufmann.
Google Scholar
Quinlan, J. R. (1993). C4.5: Programs for machine learning. San Mateo, CA: Morgan Kaufmann.
Google Scholar
Quinlan, J. R. (1996). Improved use of continuous attributes in C4.5. Journal of Artificial Intelligence Research,4, 77–90.
Google Scholar
Rao, R. B., Gordon, D., & Spears, W. (1995). For every generalization action, is there really an equal and opposite reaction? Analysis of the conservation law for generalization performance. In Proceedings of the Twelfth International Conference on Machine Learning (pp. 471–479). San Francisco, CA: Morgan Kaufmann.
Google Scholar
Sahami, M. (1996). Learning limited dependence Bayesian classifiers. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (pp. 334–338). Menlo Park, CA: The AAAI Press.
Google Scholar
Schaffer, C. (1994). A conservation law for generalization performance. In Proceedings of the Eleventh International Conference on Machine Learning (pp. 259–265). San Francisco, CA: Morgan Kaufmann.
Google Scholar
Singh, M. & Provan, G. M. (1995). A comparison of induction algorithms for selective and non-selective Bayesian classifiers. In Proceedings of the Twelfth International Conference on Machine Learning (pp. 497–505).San Francisco, CA: Morgan Kaufmann.
Google Scholar
Singh, M. & Provan, G. M. (1996). Efficient learning of selective Bayesian network classifiers. In Proceedings of the Thirteenth International Conference on Machine Learning (pp. 453–461). San Francisco, CA: Morgan Kaufmann.
Google Scholar
Ting, K. M. (1994a). The problem of small disjuncts: Its remedy in decision trees. In Proceedings of the Tenth Canadian Conference on Artificial Intelligence (pp. 91–97). Canadian Society for Computational studies of Intelligence.
Ting, K. M. (1994b). Discretization of continuous-valued attributes and instance-based learning. Technical Report 491, Basser Department of Computer Science, University of Sydney, Sydney, Australia.
Google Scholar
Viswanathan, M. & Webb, G. I. (1998). Classification learning using all rules. In Proceedings of the Tenth European Conference on Machine Learning (pp. 149–159). Berlin: Springer-Verlag.
Google Scholar
Webb, G. I. (1996).Aheuristic covering algorithm outperforms learning all rules. In Proceedings of the Conference, ISIS '96: Information, Statistics and Induction in Science (pp. 20–30). Singapore: World Scientific.
Google Scholar
Webb, G. I.& Pazzani, M. J. (1998). Adjusted probability naive Bayesian induction. In Proceedings of the Eleventh Australian Joint Conference on Artificial Intelligence (pp. 285–295). Berlin: Springer-Verlag.
Google Scholar
Wolpert, D. H. (1994). The relationship between PAC, the statistical physics framework, the Bayesian framework, and the VC framework. In D. H. Wolpert (Ed.), The mathematics of generalization. Addison Wesley.

Download references

Author information

Authors and Affiliations

School of Computing and Mathematics, Deakin University, Geelong, Victoria, 3217, Australia
Zijian Zheng
School of Computing and Mathematics, Deakin University, Geelong, Victoria, 3217, Australia
Geoffrey I. Webb

Authors

Zijian Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Geoffrey I. Webb
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zheng, Z., Webb, G.I. Lazy Learning of Bayesian Rules. Machine Learning 41, 53–84 (2000). https://doi.org/10.1023/A:1007613203719

Download citation

Issue Date: October 2000
DOI: https://doi.org/10.1023/A:1007613203719

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Lazy Learning of Bayesian Rules

Abstract

Article PDF

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A random forest guided tour

A survey on semi-supervised learning

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Lazy Learning of Bayesian Rules

Abstract

Article PDF

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A random forest guided tour

A survey on semi-supervised learning

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation