Abstract
Ensemble Learning is a popular learning paradigm and finds its application in many diverse fields. Random Forest, a decision tree based ensemble learning algorithm has received constant attention in the research community due to its ability to learn complex rules and generalize well for unknown data. Identifying the number of base classifiers (trees) required for a particular dataset is one of the key questions addressed in this paper. Statistical analyses of individual base classifiers are carried out to prune the ensemble model without compromising the classification accuracy of the model. Learning the learned model, i.e., learning the statistics of the forest in its entirety along with the information available in the dataset can reveal the optimal thresholds that should be used to prune an ensemble model. Experimental results reveal that, on an average 78% of the trees were pruned on 26 different datasets obtained from the UCI repository. The impact of pruning was positive with 22 out of 26 datasets showing equal or better classification accuracy in comparison with the Classical Random Forest algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Baldi, P., Brunak, S., Chauvin, Y., Andersen, C.A.F., Nielsen, H.: Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 16(5), 412–424 (2000)
Brazdil, P., Giraud-Carrier, C., Soares, C., Vilalta, R.: Metalearning: Applications to Data Mining, 1st edn. Springer Publishing Company, Incorporated (2008)
Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000)
Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)
Freund, Y., Schapire, R.E.: A Short Introduction to Boosting. Journal of Japanese Society for Artificial Intelligence 14(5), 771–780 (1999)
Hansen, L., Salamon, P.: Neural network ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence 12(10), 993–1001 (1990)
Hu, B.G., He, R., Yuan, X.T.: Information-theoretic measures for objective evaluation of classifications. Acta Automatica Sinica 38(7), 1169–1182 (2012)
Kononenko, I., Bratko, I.: Information-based evaluation criterion for classifier’s performance. Mach. Learn. 6(1), 67–80 (1991)
MartÃnez-Muñoz, G., Suárez, A.: Pruning in ordered bagging ensembles. In: Proceedings of the 23rd International Conference on Machine Learning, ICML 2006, pp. 609–616. ACM, New York (2006)
MartÃnez-Muñoz, G., Suárez, A.: Using boosting to prune bagging ensembles. Pattern Recogn. Lett. 28(1), 156–165 (2007). http://dx.org/10.1016/j.patrec.2006.06.018
Matthews, B.W.: Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim. Biophys. Acta 405, 442–451 (1975)
Opitz, D., Maclin, R.: Popular ensemble methods: An empirical study. Journal of Artificial Intelligence Research 11, 169–198 (1999)
Powers, D.M.W.: Evaluation: From precision, recall and f-measure to roc., informedness, markedness & correlation. Journal of Machine Learning Technologies 2(1), 37–63 (2011)
Robnik-Šikonja, M.: Improving random forests. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 359–370. Springer, Heidelberg (2004)
Vilalta, R., Giraud-Carrier, C., Brazdil, P.: Meta-learning - concepts and techniques. In: Data Mining and Knowledge Discovery Handbook, pp. 717–731. Springer, Boston (2010). http://dx.org/10.1007/978-0-387-09823-4_36
Winham, S.J., Freimuth, R.R., Biernacka, J.M.: A weighted random forests approach to improve predictive performance. Statistical Analysis and Data Mining 6(6), 496–505 (2013)
Yang, F., Hang Lu, W., Kai Luo, L., Li, T.: Margin optimization based pruningfor random forest. Neurocomputing 94, 54–63 (2012)
Zhang, H., Wang, W.: Search for the Smallest Random Forest, pp. 381–388 (2009)
Zhou, Z.H., Tang, W.: Selective ensemble of decision trees. In: Wang, G., Liu, Q., Yao, Y., Skowron, A. (eds.) Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing. LNCS, vol. 2639, pp. 476–483. Springer, Heidelberg (2003)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Dheenadayalan, K., Srinivasaraghavan, G., Muralidhara, V.N. (2016). Pruning a Random Forest by Learning a Learning Algorithm. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2016. Lecture Notes in Computer Science(), vol 9729. Springer, Cham. https://doi.org/10.1007/978-3-319-41920-6_41
Download citation
DOI: https://doi.org/10.1007/978-3-319-41920-6_41
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41919-0
Online ISBN: 978-3-319-41920-6
eBook Packages: Computer ScienceComputer Science (R0)