Estimating performance gains for voted decision trees

doi:10.1016/S1088-467X(98)00028-6

Abstract

Decision tree induction is a prominent learning method, typically yielding quick results with competitive predictive performance. However, it is not unusual to find other automated learning methods that exceed the predictive performance of a decision tree on the same application. To achieve near-optimal classification results, resampling techniques can be employed to generate multiple decision-tree solutions. These decision trees are individually applied and their answers voted. The potential for exceptionally strong performance is counterbalanced by the substantial increase in computing time to induce many decision trees. We describe estimators of predictive performance for voted decision trees induced from bootstrap (bagged) or adaptive (boosted) resampling. The estimates are found by examining the performance of a single tree and its pruned subtrees over a single, training set and a large test set. Using publicly available collections of data, we show that these estimates are usually quite accurate, with occasional weaker estimates. The great advantage of these estimates is that they reveal the predictive potential of voted decision trees prior to applying expensive computational procedures.

Article preview

Intelligent Data Analysis

Abstract

References (8)

Error Reduction Through Learning Multiple Descriptions

Machine Learning

Bagging Predictors

Machine Learning

Bias, Variance, and Arcing Classifiers

Cited by (0)