Pruning a Random Forest by Learning a Learning Algorithm

Dheenadayalan, Kumar; Srinivasaraghavan, G.; Muralidhara, V. N.

doi:10.1007/978-3-319-41920-6_41

Kumar Dheenadayalan¹⁴,
G. Srinivasaraghavan¹⁴ &
V. N. Muralidhara¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9729))

Included in the following conference series:

International Conference on Machine Learning and Data Mining in Pattern Recognition

3069 Accesses
2 Citations

Abstract

Ensemble Learning is a popular learning paradigm and finds its application in many diverse fields. Random Forest, a decision tree based ensemble learning algorithm has received constant attention in the research community due to its ability to learn complex rules and generalize well for unknown data. Identifying the number of base classifiers (trees) required for a particular dataset is one of the key questions addressed in this paper. Statistical analyses of individual base classifiers are carried out to prune the ensemble model without compromising the classification accuracy of the model. Learning the learned model, i.e., learning the statistics of the forest in its entirety along with the information available in the dataset can reveal the optimal thresholds that should be used to prune an ensemble model. Experimental results reveal that, on an average 78% of the trees were pruned on 26 different datasets obtained from the UCI repository. The impact of pruning was positive with 22 out of 26 datasets showing equal or better classification accuracy in comparison with the Classical Random Forest algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Baldi, P., Brunak, S., Chauvin, Y., Andersen, C.A.F., Nielsen, H.: Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 16(5), 412–424 (2000)
Article Google Scholar
Brazdil, P., Giraud-Carrier, C., Soares, C., Vilalta, R.: Metalearning: Applications to Data Mining, 1st edn. Springer Publishing Company, Incorporated (2008)
Google Scholar
Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140
Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article MathSciNet MATH Google Scholar
Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000)
Chapter Google Scholar
Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)
Article MathSciNet MATH Google Scholar
Freund, Y., Schapire, R.E.: A Short Introduction to Boosting. Journal of Japanese Society for Artificial Intelligence 14(5), 771–780 (1999)
Google Scholar
Hansen, L., Salamon, P.: Neural network ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence 12(10), 993–1001 (1990)
Article Google Scholar
Hu, B.G., He, R., Yuan, X.T.: Information-theoretic measures for objective evaluation of classifications. Acta Automatica Sinica 38(7), 1169–1182 (2012)
MathSciNet Google Scholar
Kononenko, I., Bratko, I.: Information-based evaluation criterion for classifier’s performance. Mach. Learn. 6(1), 67–80 (1991)
Google Scholar
Martínez-Muñoz, G., Suárez, A.: Pruning in ordered bagging ensembles. In: Proceedings of the 23rd International Conference on Machine Learning, ICML 2006, pp. 609–616. ACM, New York (2006)
Google Scholar
Martínez-Muñoz, G., Suárez, A.: Using boosting to prune bagging ensembles. Pattern Recogn. Lett. 28(1), 156–165 (2007). http://dx.org/10.1016/j.patrec.2006.06.018
Article Google Scholar
Matthews, B.W.: Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim. Biophys. Acta 405, 442–451 (1975)
Article Google Scholar
Opitz, D., Maclin, R.: Popular ensemble methods: An empirical study. Journal of Artificial Intelligence Research 11, 169–198 (1999)
MATH Google Scholar
Powers, D.M.W.: Evaluation: From precision, recall and f-measure to roc., informedness, markedness & correlation. Journal of Machine Learning Technologies 2(1), 37–63 (2011)
MathSciNet Google Scholar
Robnik-Šikonja, M.: Improving random forests. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 359–370. Springer, Heidelberg (2004)
Chapter Google Scholar
Vilalta, R., Giraud-Carrier, C., Brazdil, P.: Meta-learning - concepts and techniques. In: Data Mining and Knowledge Discovery Handbook, pp. 717–731. Springer, Boston (2010). http://dx.org/10.1007/978-0-387-09823-4_36
Google Scholar
Winham, S.J., Freimuth, R.R., Biernacka, J.M.: A weighted random forests approach to improve predictive performance. Statistical Analysis and Data Mining 6(6), 496–505 (2013)
Article MathSciNet MATH Google Scholar
Yang, F., Hang Lu, W., Kai Luo, L., Li, T.: Margin optimization based pruningfor random forest. Neurocomputing 94, 54–63 (2012)
Article Google Scholar
Zhang, H., Wang, W.: Search for the Smallest Random Forest, pp. 381–388 (2009)
Google Scholar
Zhou, Z.H., Tang, W.: Selective ensemble of decision trees. In: Wang, G., Liu, Q., Yao, Y., Skowron, A. (eds.) Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing. LNCS, vol. 2639, pp. 476–483. Springer, Heidelberg (2003)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

International Institute of Information Technology, Bangalore, India
Kumar Dheenadayalan, G. Srinivasaraghavan & V. N. Muralidhara

Authors

Kumar Dheenadayalan
View author publications
You can also search for this author in PubMed Google Scholar
G. Srinivasaraghavan
View author publications
You can also search for this author in PubMed Google Scholar
V. N. Muralidhara
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kumar Dheenadayalan .

Editor information

Editors and Affiliations

IBaI, Inst of Comp Vision and applied Comp Sci, Leipzig, Sachsen, Germany
Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dheenadayalan, K., Srinivasaraghavan, G., Muralidhara, V.N. (2016). Pruning a Random Forest by Learning a Learning Algorithm. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2016. Lecture Notes in Computer Science(), vol 9729. Springer, Cham. https://doi.org/10.1007/978-3-319-41920-6_41

Download citation

DOI: https://doi.org/10.1007/978-3-319-41920-6_41
Published: 28 June 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41919-0
Online ISBN: 978-3-319-41920-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics