Skip to main content

Pruning a Random Forest by Learning a Learning Algorithm

  • Conference paper
  • First Online:
Machine Learning and Data Mining in Pattern Recognition (MLDM 2016)

Abstract

Ensemble Learning is a popular learning paradigm and finds its application in many diverse fields. Random Forest, a decision tree based ensemble learning algorithm has received constant attention in the research community due to its ability to learn complex rules and generalize well for unknown data. Identifying the number of base classifiers (trees) required for a particular dataset is one of the key questions addressed in this paper. Statistical analyses of individual base classifiers are carried out to prune the ensemble model without compromising the classification accuracy of the model. Learning the learned model, i.e., learning the statistics of the forest in its entirety along with the information available in the dataset can reveal the optimal thresholds that should be used to prune an ensemble model. Experimental results reveal that, on an average 78% of the trees were pruned on 26 different datasets obtained from the UCI repository. The impact of pruning was positive with 22 out of 26 datasets showing equal or better classification accuracy in comparison with the Classical Random Forest algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baldi, P., Brunak, S., Chauvin, Y., Andersen, C.A.F., Nielsen, H.: Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 16(5), 412–424 (2000)

    Article  Google Scholar 

  2. Brazdil, P., Giraud-Carrier, C., Soares, C., Vilalta, R.: Metalearning: Applications to Data Mining, 1st edn. Springer Publishing Company, Incorporated (2008)

    Google Scholar 

  3. Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140

    Google Scholar 

  4. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  5. Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  6. Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  7. Freund, Y., Schapire, R.E.: A Short Introduction to Boosting. Journal of Japanese Society for Artificial Intelligence 14(5), 771–780 (1999)

    Google Scholar 

  8. Hansen, L., Salamon, P.: Neural network ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence 12(10), 993–1001 (1990)

    Article  Google Scholar 

  9. Hu, B.G., He, R., Yuan, X.T.: Information-theoretic measures for objective evaluation of classifications. Acta Automatica Sinica 38(7), 1169–1182 (2012)

    MathSciNet  Google Scholar 

  10. Kononenko, I., Bratko, I.: Information-based evaluation criterion for classifier’s performance. Mach. Learn. 6(1), 67–80 (1991)

    Google Scholar 

  11. Martínez-Muñoz, G., Suárez, A.: Pruning in ordered bagging ensembles. In: Proceedings of the 23rd International Conference on Machine Learning, ICML 2006, pp. 609–616. ACM, New York (2006)

    Google Scholar 

  12. Martínez-Muñoz, G., Suárez, A.: Using boosting to prune bagging ensembles. Pattern Recogn. Lett. 28(1), 156–165 (2007). http://dx.org/10.1016/j.patrec.2006.06.018

    Article  Google Scholar 

  13. Matthews, B.W.: Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim. Biophys. Acta 405, 442–451 (1975)

    Article  Google Scholar 

  14. Opitz, D., Maclin, R.: Popular ensemble methods: An empirical study. Journal of Artificial Intelligence Research 11, 169–198 (1999)

    MATH  Google Scholar 

  15. Powers, D.M.W.: Evaluation: From precision, recall and f-measure to roc., informedness, markedness & correlation. Journal of Machine Learning Technologies 2(1), 37–63 (2011)

    MathSciNet  Google Scholar 

  16. Robnik-Šikonja, M.: Improving random forests. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 359–370. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  17. Vilalta, R., Giraud-Carrier, C., Brazdil, P.: Meta-learning - concepts and techniques. In: Data Mining and Knowledge Discovery Handbook, pp. 717–731. Springer, Boston (2010). http://dx.org/10.1007/978-0-387-09823-4_36

    Google Scholar 

  18. Winham, S.J., Freimuth, R.R., Biernacka, J.M.: A weighted random forests approach to improve predictive performance. Statistical Analysis and Data Mining 6(6), 496–505 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  19. Yang, F., Hang Lu, W., Kai Luo, L., Li, T.: Margin optimization based pruningfor random forest. Neurocomputing 94, 54–63 (2012)

    Article  Google Scholar 

  20. Zhang, H., Wang, W.: Search for the Smallest Random Forest, pp. 381–388 (2009)

    Google Scholar 

  21. Zhou, Z.H., Tang, W.: Selective ensemble of decision trees. In: Wang, G., Liu, Q., Yao, Y., Skowron, A. (eds.) Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing. LNCS, vol. 2639, pp. 476–483. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kumar Dheenadayalan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Dheenadayalan, K., Srinivasaraghavan, G., Muralidhara, V.N. (2016). Pruning a Random Forest by Learning a Learning Algorithm. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2016. Lecture Notes in Computer Science(), vol 9729. Springer, Cham. https://doi.org/10.1007/978-3-319-41920-6_41

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-41920-6_41

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-41919-0

  • Online ISBN: 978-3-319-41920-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics