Skip to main content
Log in

Consolidated trees versus bagging when explanation is required

  • Published:
Computing Aims and scope Submit manuscript

Abstract

In some real-world problems solved by machine learning it is compulsory for the solution provided to be comprehensible so that the correct decision can be made. It is in this context that this paper compares bagging (one of the most widely used multiple classifier systems) with the consolidated trees construction (CTC) algorithm, when the learning problem to be solved requires the classification made to be provided with an explanation. Bearing in mind the comprehensibility shortcomings of bagging, the Domingos’ proposal, called combining multiple models, has been used to address this problem. The two algorithms have been compared from three main points of view: accuracy, quality of the explanation the classification is provided with, and computational cost. The results obtained show that it is beneficial to use CTC in situations where an explanation is required, because: CTC has a greater discriminating capacity than the explanation extraction algorithm added to bagging; the explanation provided is of a greater quality; it is simpler and more reliable; and CTC is computationally more efficient.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agnar A, Plaza E (1994) Case-based reasoning: foundational issues, methodological variations, and system approaches. Artif Intell Commun 7(1): 39–52

    Google Scholar 

  2. Andonova S, Elisseeff A, Evgeniou T, Pontil M (2002) A simple algorithm for learning stable machines. In: Proceedings of the European conference on artificial intelligence, pp 513–517

  3. Asuncion A, Newman DJ (2007) UCI machine learning repository. University of California, School of Information and Computer Science, Irvine. http://www.ics.uci.edu/~learn/MLRepository.html

  4. Banfield RE, Hall LO, Bowyer KW, Bhadoria D, Kegelmeyer WP, Eschrich S (2004) A comparison of ensemble creation techniques. In: The fifth international conference on multiple classifier systems. Cagliari, Italy, pp 223–232

  5. Banfield RE, Hall LO, Bowyer KW, Kegelmeyer WP (2007) A comparison of decision tree ensemble creation techniques. IEEE Trans Pattern Anal Mach Intell 29: 173–180

    Article  Google Scholar 

  6. Bauer E, Kohavi R (1999) An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Mach Learn 36: 105–139

    Article  Google Scholar 

  7. Breiman L (1996) Bagging predictors. Mach Learn 24: 123–140

    MATH  MathSciNet  Google Scholar 

  8. Chawla NV, Hall LO, Bowyer KW, Kegelmeyer WP (2004) Learning ensembles from bites: a scalable and accurate approach. J Mach Learn Res 5: 421–451

    MathSciNet  Google Scholar 

  9. Craven WM (1996) Extracting comprehensible models from trained neural networks, Phd Thesis. University of Wisconsin, Madison

  10. Wall R, Cunningham P, Walsh P (2002) Explaining predictions from a neural network ensemble one at a time. In: Proceedings of the 6th European conference on principles of data mining and knowledge discovery, pp 449–460

  11. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7: 1–30

    MathSciNet  Google Scholar 

  12. Dietterich TG (1997) Machine learning research: four currents directions. AI Mag 18(4): 97–136

    Google Scholar 

  13. Dietterich TG (2002) An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Mach Learn 40: 139–157

    Article  Google Scholar 

  14. Domingos P (1997) Knowledge acquisition from examples via multiple models. In: Proceedings of 14th international conference on machine learning, Nashville, pp 98–106

  15. Drummond C, Holte RC (2000) Exploiting the cost (in)sensitivity of decision tree splitting criteria. In: Proceedings of the 17th international conference on Machine Learning, pp 239–246

  16. Dwyer K, Holte R (2007) Decision tree instability and active learning. In: Proceedings of the 18th European conference on machine learning, ECML, pp 128–139

  17. Elisseeff A, Evgeniou T, Pontil M, Kaelbling P (2005) Stability of randomized learning algorithms. J Mach Learn 6: 55–79

    Google Scholar 

  18. Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. In: Proceedings of the 13th international conference on machine learning, pp 148–156

  19. García S, Herrera F (2008) An extension on “statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons. J Mach Learn Res 9: 2677–2694

    Google Scholar 

  20. Gurrutxaga I, Pérez JM, Arbelaitz O, Martín JI, Muguerza J (2006) Analysis of the performance of a parallel implementation for the CTC algorithm. In: Workshop on state-of-the-art in Scientific and Parallel Computing (PARA’06), Umea, Sweden

  21. Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer, Berlin ISBN: 0-387-95284-5

    MATH  Google Scholar 

  22. Johansson U, Niklasson L, König R (2004) Accuracy vs. comprehensibility in data mining models. In: The 7th international conference on information fusion, Stockholm, Sweden

  23. Mease D, Wyner AJ, Buja A (2007) Boosted classification trees and class probability/quantile estimation. J Mach Learn Res 8: 409–439

    Google Scholar 

  24. Núñez H, Angulo C, Català A (2002) Rule extraction from support vector machines. In: ESANN’2002 proceedings of the European symposium on artificial neural networks bruges (Belgium), pp 107–112

  25. Opitz D, Maclin R (1999) Popular ensemble methods: an empirical study. JAIR 11: 169–198

    MATH  Google Scholar 

  26. Paliouras G, Brée DS (1995) The effect of numeric features on the scalability of inductive learning programs. LNCS, vol 912. In: 8th European conference on machine learning (ECML), Greece, pp 218–231

  27. Pérez JM, Muguerza J, Arbelaitz O, Gurrutxaga I, Martín JI et al (2006) Consolidated trees: an analysis of structural convergence, LNAI 3755. In: Graham JW (eds) Data mining: theory, methodology, techniques, and applications. Springer, Berlin, pp 39–52

    Google Scholar 

  28. Pérez JM (2006) Árboles consolidados: construcción de un árbol de clasificación basado en múltiples submuestras sin renunciar a la explicación, Phd thesis. University of Basque Country, Donostia

  29. Pérez JM, Muguerza J, Arbelaitz O, Gurrutxaga I, Martín JI (2007) Combining multiple class distribution modified subsamples in a single tree. Pattern Recognit Lett 28(4): 414–422

    Article  Google Scholar 

  30. Provost F, Jensen D, Oates T (1999) Efficient progressive sampling. In: Proceedings of 5th international conference on knowledge discovery and data mining. AAAI Press, Menlo Park, pp 23–32

  31. Quinlan JR (1993) C4.5: Programs for machine learning. Morgan Kaufmann, San Mateo

    Google Scholar 

  32. Schank R (1982) Dynamic memory: a theory of learning in computers and people. Cambridge University Press, New York

    Google Scholar 

  33. Setiono R, Leow WK, Zurada JM (2002) Extraction of rules from artificial neural networks for nonlinear regression. IEEE Trans Neural Netw 13(3): 564–577

    Article  Google Scholar 

  34. Skurichina M, Kuncheva LI, Duin RPW (2002) Bagging and boosting for the nearest mean classifier: effects of sample size on diversity and accuracy. Multiple classifier systems: proceedings of 3rd international workshop, MCS, LNCS. Cagliari, Italy, vol 2364, pp 62–71

  35. Turney P (1995) Bias and the quantification of stability. Mach Learn 20: 23–33

    Google Scholar 

  36. Windeatt T, Ardeshir G (2002) Boosted tree ensembles for solving multiclass problems. In: Multiple classifier systems: proceedings of 3rd interernational Workshop, MCS, LNCS. Cagliari, Italy, vol 2364, pp 42–51

  37. Xu L, Krzyzak A, Suen CY (1992) Methods of combining multiple classifiers and their applications to handwriting recognition. IEEE Trans Syst Man Cybern SMC-22(3): 418–435

    Article  Google Scholar 

  38. Yao YY, Zhao Y, Maguire RB (2003) Explanation oriented association mining using rough set theory. In: Proceedings of the 9th international conference rough sets, fuzzy sets, data mining, and granular computing, (RSFDGrC, 2003), LNAI, vol 2639, pp 165–172

  39. Yao YY, Zhao Y, Maguire RB (2003) Explanation oriented association mining using combination of unsupervised and supervised learning algorithms. In: Advances in artificial intelligence, proceedings of the 16th conference of the Canadian Society for Computational Studies of Intelligence (AI 2003), LNAI, vol 2671, pp 527–532

  40. Zenobi G, Cunningham P (2002) An approach to aggregating ensembles of lazy learners that supports explanation. In: Advances in case-based reasoning, 6th European conference ECCBR, pp 436–447

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Olatz Arbelaitz.

Additional information

Communicated by R. Neruda.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pérez, J.M., Albisua, I., Arbelaitz, O. et al. Consolidated trees versus bagging when explanation is required. Computing 89, 113–145 (2010). https://doi.org/10.1007/s00607-010-0094-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00607-010-0094-z

Keywords

Mathematics Subject Classification (2000)

Navigation