Abstract
In real world problems solved with machine learning techniques, achieving small error rates is important, but there are situations where an explanation is compulsory. In these situations the stability of the given explanation is crucial. We have presented a methodology for building classification trees, Consolidated Trees Construction Algorithm (CTC). CTC is based on subsampling techniques, therefore it is suitable to face class imbalance problems, and it improves the error rate of standard classification trees and has larger structural stability. The built trees are more steady as the number of subsamples used for induction increases, and therefore also the explanation related to the classification is more steady and wider. In this paper a model is presented for estimating the number of subsamples that would be needed to achieve the desired structural convergence level. The values estimated using the model and the real values are very similar, and there are not statistically significant differences.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Blake, C.L., Merz, C.J.: UCI Repository of Machine Learning Databases, University of California, Irvine, Dept. of Information and Computer Sciences (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Dietterich, T.G.: Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms. Neural Computation 10(7), 1895–1924 (1998)
Domingos, P.: Knowledge acquisition from examples via multiple models. In: Proc. 14th International Conference on Machine Learning Nashville, TN, pp. 98–106 (1997)
Drummond, C., Holte, R.C.: Exploiting the Cost (In)sensitivity of Decision Tree Splitting Criteria. In: Proceedings of the 17th International Conference on Machine Learning, pp. 239–246 (2000)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, Heidelberg (2001), ISBN: 0-387-95284-5
Pérez, J.M., Muguerza, J., Arbelaitz, O., Gurrutxaga, I., Martín, J.I.: Behavior of Consolidated Trees when using Resampling Techniques. In: Proceedings of the 4th International Workshop on Pattern Recognition in Information Systems (PRIS 2004), Porto, Portugal, pp. 139–148 (2004)
Pérez, J.M., Muguerza, J., Arbelaitz, O., Gurrutxaga, I., Martín, J.I.: Analysis of structural convergence of Consolidated Trees when resampling is required. In: Proceedings of the 3rd Australasian Data Mining Conference (AusDM 2004), Cairns, Australia, pp. 9–21 (2004)
Quinlan, J.R. (ed.): C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Mateo (1993)
Turney, P.: Bias and the quantification of stability. Machine Learning 20, 23–33 (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pérez, J.M., Muguerza, J., Arbelaitz, O., Gurrutxaga, I., Martín, J.I. (2005). Consolidated Trees: Classifiers with Stable Explanation. A Model to Achieve the Desired Stability in Explanation. In: Singh, S., Singh, M., Apte, C., Perner, P. (eds) Pattern Recognition and Data Mining. ICAPR 2005. Lecture Notes in Computer Science, vol 3686. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11551188_11
Download citation
DOI: https://doi.org/10.1007/11551188_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28757-5
Online ISBN: 978-3-540-28758-2
eBook Packages: Computer ScienceComputer Science (R0)