ABSTRACT
Naive Bayes(NB) is well-known for its effective and relatively high accuracy for classification tasks. But its strong assumption that each attribute is independent diminishes its predictive accuracy. To weaken this assumption, some researchers proposed to allow limited number of interdependences between attributes. One of these attempts is Tree Augmented Naive Bayes(TAN), which is also the optimal 1-dependence classifier in Bayesian Network Classifiers(BNCs) for its excellent performance. But TAN can not be further promoted to 2-dependence if more interdependences between attributes are desired to be represented. Even the desired dependences have been found, adding it to the structure arbitrarily may cause the appearance of cycles if the direction is not correctly set. Those factors limited TAN's classification accuracy to much extent. We propose to apply greedy search algorithm on the conditional mutual information matrix generated by TAN to find all the significant dependences between attributes and then using a newly defined measure to set their direction. In this way, we can extend TAN to a higher dependence, name it kTAN, where k controls the number of allowed dependences of each attribute. Empirical studies showed that kTAN has significantly advantage over TAN on classification accuracy with acceptable cost of complexity.
- Dewan Md Farid, Li Zhang, Alamgir Hossain, Chowdhury Mofizur Rahman, Rebecca Strachan, Graham Sexton, and Keshav Dahal. An adaptive ensemble classifier for mining concept drifting data streams. Expert Systems with Applications, 40(15):5895--5906, 2013. Google ScholarDigital Library
- Shenglei Chen, Ana M. Martinez, Geoffrey I. Webb, and Limin Wang. Sample-Based Attribute Selective A$n$ DE for Large Data. IEEE Transactions on Knowledge and Data Engineering, 29(1):172--185, 2017. Google ScholarDigital Library
- Shu-Hsien Liao, Pei-Hui Chu, and Pei-Yuan Hsiao. Data mining techniques and applications-a decade review from 2000 to 2011. Expert systems with applications, 39(12):11303--11311, 2012. Google ScholarDigital Library
- Yushi Jing, Vladimir Pavlović, and James M Rehg. Efficient discriminative learning of bayesian network classifier via boosted augmented naive bayes. In Proceedings of the 22nd international conference on Machine learning, pages 369--376. ACM, 2005. Google ScholarDigital Library
- Dewan Md. Farid, Li Zhang, Chowdhury Mofizur Rahman, M. A. Hossain, and Rebecca Strachan. Hybrid decision tree and naïve bayes classifiers for multi-class classification tasks. Expert Syst. Appl., 41(4):1937--1946, March 2014. Google ScholarDigital Library
- Kemal Polat and Salih Güneş. A novel hybrid intelligent method based on c4.5 decision tree classifier and one-against-all approach for multi-class classification problems. Expert Syst. Appl., 36(2):1587--1592, March 2009. Google ScholarDigital Library
- Judea Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1988. Google ScholarDigital Library
- Chuan Choong Yang, Chit Siang Soh, and Vooi Voon Yap. A systematic approach in appliance disaggregation using k-nearest neighbours and naive bayes classifiers for energy efficiency. Energy Efficiency, 11(1):239--259, Jan 2018.Google ScholarCross Ref
- David Maxwell Chickering, Christopher Meek, and David Heckerman. Large-sample learning of bayesian networks is np-hard. CoRR, abs/1212.2468, 2012.Google Scholar
- Causality: Models, Reasoning, and Inference. Econometric Theory, 19:675--685, 2000.Google Scholar
- Tom Burr. Causation, Prediction, and Search, volume 45. 2003.Google Scholar
- Gideon Schwarz. Estimating the dimension of a model. Ann. Statist., 6(2):461--464, 03 1978.Google ScholarCross Ref
- Gregory F Cooper and Edward Herskovits. A bayesian method for constructing bayesian belief networks from databases. In Proceedings of the Seventh conference on Uncertainty in Artificial Intelligence, pages 86--94. Morgan Kaufmann Publishers Inc., 1991. Google ScholarDigital Library
- Jie Cheng, Russell Greiner, Jonathan Kelly, David Bell, and Weiru Liu. Learning bayesian networks from data: An information-theory based approach. Artificial Intelligence, 137(1):43--90, 2002. Google ScholarDigital Library
- P. Langley, W. Iba, and K. Thompson. An analysis of bayesian classifiers. An analysis of Bayesian classifiers, 1992. cited By 11.Google Scholar
- Lam Hong Lee and Dino Isa. Automatically computed document dependent weighting factor facility for naÃŕve bayes classification. Expert Systems with Applications, 37(12):8471--8478, 2010. Google ScholarDigital Library
- Liangxiao Jiang, Zhihua Cai, Harry Zhang, and Dianhong Wang. Not so greedy: Randomly selected naive bayes. Expert Systems with Applications, 39(12):11022--11028, 2012. Google ScholarDigital Library
- Wei-Yi Liu, Kun Yue, and Wei-Hua Li. Constructing the bayesian network structure from dependencies implied in multiple relational schemas. Expert Systems with Applications, 38(6):7123--7134, 2011. Google ScholarDigital Library
- Nir Friedman, Dan Geiger, Moises Goldszmidt, G Provan, P Langley, and P Smyth. Bayesian Network Classifiers*. Machine Learning, 29:131--163, 1997. Google ScholarDigital Library
- Mehran Sahami. Learning limited dependence bayesian classifiers. In KDD, volume 96, pages 335--338, 1996. Google ScholarDigital Library
- Pradeep Kumar, Partha Pratim Roy, and Debi Prosad Dogra. Independent bayesian classifier combination based sign language recognition using facial expression. Information Sciences, 428:30--48, 2018. Google ScholarDigital Library
- Mehmet Ali Cengiz, Emre Dünder, and Talat Şenel. Energy performance evaluation of oecd countries using bayesian stochastic frontier analysis and bayesian network classifiers. Journal of Applied Statistics, 45(1):17--25, 2018.Google ScholarCross Ref
- C. K. Chow and C. N. Liu. Approximating Discrete Probability Distributions with Dependence Trees. IEEE Transactions on Information Theory, 14(3):462--467, 1968. Google ScholarDigital Library
- Liangxiao Jiang, Zhihua Cai, Dianhong Wang, and Harry Zhang. Improving tree augmented naive bayes for class probability estimation. Knowledge-Based Systems, 26:239--245, 2012. Google ScholarDigital Library
- M. Martínez Ana, I. Webb Geoffrey, Chen Shenglei, and A. Zaidi Nayyar. Scalable learning of Bayesian network classifiers. Journal of Machine Learning Research, 17:1--35, 2016. Google ScholarDigital Library
- Jun-Geol Baek, Chang-Ouk Kim, and Sung-Shick Kim. Multi-interval discretization of continuous-valued attributes for constructing incremental decision tree. Journal of Korean Institute of Industrial Engineers, 27(4):394--405, 2001.Google Scholar
- Janez Demšar. Statistical comparisons of classifiers over multiple data sets. Journal of Machine learning research, 7(Jan):1--30, 2006 Google ScholarDigital Library
Index Terms
- Structure Extension of TAN Through Greedy Search
Recommendations
TAN Classifiers Based on Decomposable Distributions
In this paper we present several Bayesian algorithms for learning Tree Augmented Naive Bayes (TAN) models. We extend the results in Meila & Jaakkola (2000a) to TANs by proving that accepting a prior decomposable distribution over TAN's, we can compute ...
On the classification performance of TAN and general Bayesian networks
Over a decade ago, Friedman et al. introduced the Tree Augmented Naive Bayes (TAN) classifier, with experiments indicating that it significantly outperformed Naive Bayes (NB) in terms of classification accuracy, whereas general Bayesian network (GBN) ...
Improving Tree augmented Naive Bayes for class probability estimation
Numerous algorithms have been proposed to improve Naive Bayes (NB) by weakening its conditional attribute independence assumption, among which Tree Augmented Naive Bayes (TAN) has demonstrated remarkable classification performance in terms of ...
Comments