Improving naive Bayes classifier by dividing its decision regions

Yan, Zhi-yong; Xu, Cong-fu; Pan, Yun-he

doi:10.1631/jzus.C1000437

Improving naive Bayes classifier by dividing its decision regions

Published: 02 August 2011

Volume 12, pages 647–657, (2011)
Cite this article

Journal of Zhejiang University SCIENCE C Aims and scope Submit manuscript

Zhi-yong Yan¹,
Cong-fu Xu¹ &
Yun-he Pan¹

153 Accesses
1 Citation
Explore all metrics

Abstract

Classification can be regarded as dividing the data space into decision regions separated by decision boundaries. In this paper we analyze decision tree algorithms and the NBTree algorithm from this perspective. Thus, a decision tree can be regarded as a classifier tree, in which each classifier on a non-root node is trained in decision regions of the classifier on the parent node. Meanwhile, the NBTree algorithm, which generates a classifier tree with the C4.5 algorithm and the naive Bayes classifier as the root and leaf classifiers respectively, can also be regarded as training naive Bayes classifiers in decision regions of the C4.5 algorithm. We propose a second division (SD) algorithm and three soft second division (SD-soft) algorithms to train classifiers in decision regions of the naive Bayes classifier. These four novel algorithms all generate two-level classifier trees with the naive Bayes classifier as root classifiers. The SD and three SD-soft algorithms can make good use of both the information contained in instances near decision boundaries, and those that may be ignored by the naive Bayes classifier. Finally, we conduct experiments on 30 data sets from the UC Irvine (UCI) repository. Experiment results show that the SD algorithm can obtain better generalization abilities than the NBTree and the averaged one-dependence estimators (AODE) algorithms when using the C4.5 algorithm and support vector machine (SVM) as leaf classifiers. Further experiments indicate that our three SD-soft algorithms can achieve better generalization abilities than the SD algorithm when argument values are selected appropriately.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A Review on Random Forest: An Ensemble Classifier

A survey on ensemble learning

Article 30 August 2019

References

Bezdek, J.C., 1981. Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York, USA.
MATH Google Scholar
Bishop, C.M., 2006. Pattern Recognition and Machine Learning. Series: Information Science and Statistics. Springer-Verlag, New York, p.179–181.
Google Scholar
Domingos, P., Pazzani, M., 1997. On the optimality of the simple Bayesian classifier under zero-one loss. Mach. Learn., 29(2–3):103–130. [doi:10.1023/A:1007413511361]
Article MATH Google Scholar
Frank, A., Asuncion, A., 2010. UCI Machine Learning Repository. School of Information and Computer Science, University of California, Irvine, CA, USA. Available from http://archive.ics.uci.edu/ml [Accessed on July 7, 2010].
Google Scholar
Frank, E., Witten, I.H., 1998. Generating Accurate Rule Sets without Global Optimization. 15th Int. Conf. on Machine Learning, p.144–151.
Frosyniotis, D., Stafylopatis, A., Likas, A., 2003. A divide-and-conquer method for multi-net classifiers. Pattern Anal. Appl., 6(1):32–40. [doi:10.1007/s10044-002-0174-6]
Article MathSciNet MATH Google Scholar
Huang, K.Z., Yang, H.Q., King, I., Lyu, M., 2008. Machine Learning: Modeling Data Locally and Globally. Springer-Verlag, New York, p.1–28.
MATH Google Scholar
Kohavi, R., 1996. Scaling up the Accuracy of Naive-Bayes Classifiers: a Decision-Tree Hybrid. 2nd Int. Conf. on Knowledge Discovery and Data Mining, p.202–207.
Mitchell, T.M., 1997. Machine Learning. WCB/McGraw-Hill, New York, p.14–15.
MATH Google Scholar
Pal, S.K., Mitra, S., 1992. Multi-layer perceptron, fuzzy sets, and classification. IEEE Trans. Neur. Networks, 3(5):683–697. [doi:10.1109/72.159058]
Article Google Scholar
Platt, J.C., 1999. Fast Training of Support Vector Machines Using Sequential Minimal Optimization. In: Scholkopf, B., Burges, C., Smola, A. (Eds.), Advances in Kernel Methods: Support Vector Machines. MIT Press, Cambridge, MA, USA, p.185–208.
Google Scholar
Quinlan, J.R., 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA, USA.
Google Scholar
Quinlan, J.R., 1996. Improved use of continuous attributes in C4.5. J. Artif. Intell. Res., 4(1):77–90.
MATH Google Scholar
Vapnik, V.N., 1995. The Nature of Statistical Learning Theory. Springer, Berlin Heidelberg.
MATH Google Scholar
Vlassis, N., Likas, A., 2002. A greedy EM algorithm for Gaussian mixture learning. Neur. Process. Lett., 15(1):77–87. [doi:10.1023/A:1013844811137]
Article MATH Google Scholar
Webb, G.I., Boughton, J.R., Wang, Z.H., 2005. Not so naive Bayes: aggregating one-dependence estimators. Mach. Learn., 58(1):5–24. [doi:10.1007/s10994-005-4258-6]
Article MATH Google Scholar
Witten, I.H., Frank, E., 2005. Data Mining: Practical Machine Learning Tools and Techniques (2nd Ed.). Morgan Kaufmann, San Francisco, CA, USA.
MATH Google Scholar
Wu, X.D., Kumar, V., Quinlan, J.R., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G.J., Ng, A., Liu, B., Yu, P.S., et al., 2008. Top 10 algorithms in data mining. Knowl. Inform. Syst., 14(1):1–37. [doi:10.1007/s10115-007-0114-2]
Article Google Scholar
Zheng, F., Webb, G.I., 2005. A Comparative Study of Semi-Naive Bayes Methods in Classification Learning. Fourth Australasian Data Mining Workshop, p.141–156.

Download references

Author information

Authors and Affiliations

Institute of Artificial Intelligence, Zhejiang University, Hangzhou, 310027, China
Zhi-yong Yan, Cong-fu Xu & Yun-he Pan

Authors

Zhi-yong Yan
View author publications
You can also search for this author in PubMed Google Scholar
Cong-fu Xu
View author publications
You can also search for this author in PubMed Google Scholar
Yun-he Pan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cong-fu Xu.

Additional information

Project supported by the National Natural Science Foundation of China (No. 60970081) and the National Basic Research Program (973) of China (No. 2010CB327903)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yan, Zy., Xu, Cf. & Pan, Yh. Improving naive Bayes classifier by dividing its decision regions. J. Zhejiang Univ. - Sci. C 12, 647–657 (2011). https://doi.org/10.1631/jzus.C1000437

Download citation

Received: 20 December 2010
Revised: 08 April 2011
Published: 02 August 2011
Issue Date: August 2011
DOI: https://doi.org/10.1631/jzus.C1000437

Key words

CLC number

TP181

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving naive Bayes classifier by dividing its decision regions

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A Review on Random Forest: An Ensemble Classifier

A survey on ensemble learning

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Key words

CLC number

Navigation

Improving naive Bayes classifier by dividing its decision regions

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A Review on Random Forest: An Ensemble Classifier

A survey on ensemble learning

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

CLC number

Search

Navigation