Estimating performance gains for voted decision trees

https://doi.org/10.1016/S1088-467X(98)00028-6Get rights and content

Abstract

Decision tree induction is a prominent learning method, typically yielding quick results with competitive predictive performance. However, it is not unusual to find other automated learning methods that exceed the predictive performance of a decision tree on the same application. To achieve near-optimal classification results, resampling techniques can be employed to generate multiple decision-tree solutions. These decision trees are individually applied and their answers voted. The potential for exceptionally strong performance is counterbalanced by the substantial increase in computing time to induce many decision trees. We describe estimators of predictive performance for voted decision trees induced from bootstrap (bagged) or adaptive (boosted) resampling. The estimates are found by examining the performance of a single tree and its pruned subtrees over a single, training set and a large test set. Using publicly available collections of data, we show that these estimates are usually quite accurate, with occasional weaker estimates. The great advantage of these estimates is that they reveal the predictive potential of voted decision trees prior to applying expensive computational procedures.

References (8)

  • AliK. et al.

    Error Reduction Through Learning Multiple Descriptions

    Machine Learning

    (1996)
  • BreimanL.

    Bagging Predictors

    Machine Learning

    (1996)
  • BreimanL.

    Bias, Variance, and Arcing Classifiers

  • BreimanL. et al.
There are more references available in the full text version of this article.

Cited by (0)

View full text