Skip to main content
Log in

Improving naive Bayes classifier by dividing its decision regions

  • Published:
Journal of Zhejiang University SCIENCE C Aims and scope Submit manuscript

Abstract

Classification can be regarded as dividing the data space into decision regions separated by decision boundaries. In this paper we analyze decision tree algorithms and the NBTree algorithm from this perspective. Thus, a decision tree can be regarded as a classifier tree, in which each classifier on a non-root node is trained in decision regions of the classifier on the parent node. Meanwhile, the NBTree algorithm, which generates a classifier tree with the C4.5 algorithm and the naive Bayes classifier as the root and leaf classifiers respectively, can also be regarded as training naive Bayes classifiers in decision regions of the C4.5 algorithm. We propose a second division (SD) algorithm and three soft second division (SD-soft) algorithms to train classifiers in decision regions of the naive Bayes classifier. These four novel algorithms all generate two-level classifier trees with the naive Bayes classifier as root classifiers. The SD and three SD-soft algorithms can make good use of both the information contained in instances near decision boundaries, and those that may be ignored by the naive Bayes classifier. Finally, we conduct experiments on 30 data sets from the UC Irvine (UCI) repository. Experiment results show that the SD algorithm can obtain better generalization abilities than the NBTree and the averaged one-dependence estimators (AODE) algorithms when using the C4.5 algorithm and support vector machine (SVM) as leaf classifiers. Further experiments indicate that our three SD-soft algorithms can achieve better generalization abilities than the SD algorithm when argument values are selected appropriately.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bezdek, J.C., 1981. Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York, USA.

    MATH  Google Scholar 

  • Bishop, C.M., 2006. Pattern Recognition and Machine Learning. Series: Information Science and Statistics. Springer-Verlag, New York, p.179–181.

    Google Scholar 

  • Domingos, P., Pazzani, M., 1997. On the optimality of the simple Bayesian classifier under zero-one loss. Mach. Learn., 29(2–3):103–130. [doi:10.1023/A:1007413511361]

    Article  MATH  Google Scholar 

  • Frank, A., Asuncion, A., 2010. UCI Machine Learning Repository. School of Information and Computer Science, University of California, Irvine, CA, USA. Available from http://archive.ics.uci.edu/ml [Accessed on July 7, 2010].

    Google Scholar 

  • Frank, E., Witten, I.H., 1998. Generating Accurate Rule Sets without Global Optimization. 15th Int. Conf. on Machine Learning, p.144–151.

  • Frosyniotis, D., Stafylopatis, A., Likas, A., 2003. A divide-and-conquer method for multi-net classifiers. Pattern Anal. Appl., 6(1):32–40. [doi:10.1007/s10044-002-0174-6]

    Article  MathSciNet  MATH  Google Scholar 

  • Huang, K.Z., Yang, H.Q., King, I., Lyu, M., 2008. Machine Learning: Modeling Data Locally and Globally. Springer-Verlag, New York, p.1–28.

    MATH  Google Scholar 

  • Kohavi, R., 1996. Scaling up the Accuracy of Naive-Bayes Classifiers: a Decision-Tree Hybrid. 2nd Int. Conf. on Knowledge Discovery and Data Mining, p.202–207.

  • Mitchell, T.M., 1997. Machine Learning. WCB/McGraw-Hill, New York, p.14–15.

    MATH  Google Scholar 

  • Pal, S.K., Mitra, S., 1992. Multi-layer perceptron, fuzzy sets, and classification. IEEE Trans. Neur. Networks, 3(5):683–697. [doi:10.1109/72.159058]

    Article  Google Scholar 

  • Platt, J.C., 1999. Fast Training of Support Vector Machines Using Sequential Minimal Optimization. In: Scholkopf, B., Burges, C., Smola, A. (Eds.), Advances in Kernel Methods: Support Vector Machines. MIT Press, Cambridge, MA, USA, p.185–208.

    Google Scholar 

  • Quinlan, J.R., 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA, USA.

    Google Scholar 

  • Quinlan, J.R., 1996. Improved use of continuous attributes in C4.5. J. Artif. Intell. Res., 4(1):77–90.

    MATH  Google Scholar 

  • Vapnik, V.N., 1995. The Nature of Statistical Learning Theory. Springer, Berlin Heidelberg.

    MATH  Google Scholar 

  • Vlassis, N., Likas, A., 2002. A greedy EM algorithm for Gaussian mixture learning. Neur. Process. Lett., 15(1):77–87. [doi:10.1023/A:1013844811137]

    Article  MATH  Google Scholar 

  • Webb, G.I., Boughton, J.R., Wang, Z.H., 2005. Not so naive Bayes: aggregating one-dependence estimators. Mach. Learn., 58(1):5–24. [doi:10.1007/s10994-005-4258-6]

    Article  MATH  Google Scholar 

  • Witten, I.H., Frank, E., 2005. Data Mining: Practical Machine Learning Tools and Techniques (2nd Ed.). Morgan Kaufmann, San Francisco, CA, USA.

    MATH  Google Scholar 

  • Wu, X.D., Kumar, V., Quinlan, J.R., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G.J., Ng, A., Liu, B., Yu, P.S., et al., 2008. Top 10 algorithms in data mining. Knowl. Inform. Syst., 14(1):1–37. [doi:10.1007/s10115-007-0114-2]

    Article  Google Scholar 

  • Zheng, F., Webb, G.I., 2005. A Comparative Study of Semi-Naive Bayes Methods in Classification Learning. Fourth Australasian Data Mining Workshop, p.141–156.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cong-fu Xu.

Additional information

Project supported by the National Natural Science Foundation of China (No. 60970081) and the National Basic Research Program (973) of China (No. 2010CB327903)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yan, Zy., Xu, Cf. & Pan, Yh. Improving naive Bayes classifier by dividing its decision regions. J. Zhejiang Univ. - Sci. C 12, 647–657 (2011). https://doi.org/10.1631/jzus.C1000437

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1631/jzus.C1000437

Key words

CLC number

Navigation