Skip to main content

Pruning Bagging Ensembles with Metalearning

  • Conference paper
  • First Online:
Book cover Multiple Classifier Systems (MCS 2015)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9132))

Included in the following conference series:

Abstract

Ensemble learning algorithms often benefit from pruning strategies that allow to reduce the number of individuals models and improve performance. In this paper, we propose a Metalearning method for pruning bagging ensembles. Our proposal differs from other pruning strategies in the sense that allows to prune the ensemble before actually generating the individual models. The method consists in generating a set characteristics from the bootstrap samples and relate them with the impact of the predictive models in multiple tested combinations. We executed experiments with bagged ensembles of 20 and 100 decision trees for 53 UCI classification datasets. Results show that our method is competitive with a state-of-the-art pruning technique and bagging, while using only 25 % of the models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    For computational reasons, it was possible to apply a cross-validation methodology.

References

  1. Brown, G., Wyatt, J., Harris, R., Yao, X.: Diversity creation methods: a survey and categorisation. Inf. Fusion 6(1), 5–20 (2005)

    Article  Google Scholar 

  2. Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach. Learn. 51(2), 181–207 (2003)

    Article  MATH  Google Scholar 

  3. Zhou, Z.H., Tang, W.: Selective ensemble of decision trees. In: Wang, G., Liu, Q., Yao, Y., Skowron, A. (eds.) RSFDGrC 2003. LNCS (LNAI), vol. 2639, pp. 476–483. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  4. Zhang, Y., Burer, S., Street, W.N.: Ensemble pruning via semi-definite programming. J. Mach. Learn. Res. 7, 1315–1338 (2006)

    MATH  MathSciNet  Google Scholar 

  5. Martinez-Muñoz, G., Hernández-Lobato, D., Suárez, A.: An analysis of ensemble pruning techniques based on ordered aggregation. IEEE Trans. Pattern Anal. Mach. Intell. 31(2), 245–259 (2009)

    Article  Google Scholar 

  6. Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)

    MATH  MathSciNet  Google Scholar 

  7. Blake, C., Merz, C.J.: UCI repository of machine learning databases (1998)

    Google Scholar 

  8. Margineantu, D.D., Dietterich, T.G.: Pruning adaptive boosting. In: ICML, vol. 97, pp. 211–218. Citeseer (1997)

    Google Scholar 

  9. Hernández-Lobato, D., Hernández-Lobato, J.M., Ruiz-Torrubiano, R., Valle, Á.: Pruning adaptive boosting ensembles by means of a genetic algorithm. In: Corchado, E., Yin, H., Botti, V., Fyfe, C. (eds.) IDEAL 2006. LNCS, vol. 4224, pp. 322–329. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  10. Qian, C., Yu, Y., Zhou, Z.H.: Pareto ensemble pruning. In: AAAI Conference on Artificial Intelligence (2015)

    Google Scholar 

  11. Li, N., Yu, Y., Zhou, Z.-H.: Diversity regularized ensemble pruning. In: Flach, P.A., De Bie, T., Cristianini, N. (eds.) ECML PKDD 2012, Part I. LNCS, vol. 7523, pp. 330–345. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  12. Hernández-Lobato, D., Martínez-Muñoz, G., Suárez, A.: How large should ensembles of classifiers be? Pattern Recogn. 46(5), 1323–1336 (2013)

    Article  MATH  Google Scholar 

  13. Mendes-Moreira, J., Soares, C., Jorge, A.M., Sousa, J.F.D.: Ensemble approaches for regression: a survey. ACM Comput. Surv. (CSUR) 45(1), 1–40 (2012). Article No. 10

    Article  Google Scholar 

  14. Brazdil, P., Carrier, C.G., Soares, C., Vilalta, R.: Metalearning: Applications to Data Mining. Springer, Heidelberg (2008)

    Google Scholar 

  15. Todorovski, L., Džeroski, S.: Combining classifiers with meta decision trees. Mach. Learn. 50(3), 223–249 (2003)

    Article  MATH  Google Scholar 

  16. Peng, Y., Flach, P.A., Soares, C., Brazdil, P.: Improved Dataset Characterisation for Meta-learning. In: Lange, S., Satoh, K., Smith, C.H. (eds.) DS 2002. LNCS, vol. 2534, pp. 141–152. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  17. Pfahringer, B., Bensusan, H., Giraud-Carrier, C.: Tell me who can learn you and i can tell you who you are: landmarking various learning algorithms. In: Proceedings of the 17th International Conference on Machine Learning, pp. 743–750 (2000)

    Google Scholar 

  18. Pinto, F., Soares, C., Mendes-Moreira, J.: An empirical methodology to analyze the behavior of bagging. In: Luo, X., Yu, J.X., Li, Z. (eds.) ADMA 2014. LNCS, vol. 8933, pp. 199–212. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  19. Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. (TOIS) 20(4), 422–446 (2002)

    Article  Google Scholar 

  20. Lin, J.: Divergence measures based on the shannon entropy. IEEE Trans. Inf. Theory 37(1), 145–151 (1991)

    Article  MATH  Google Scholar 

  21. Saez, C., Robles, M., Garcia-Gomez, J.M.: Comparative study of probability distribution distances to define a metric for the stability of multi-source biomedical research data. In: EMBC, pp. 3226–3229. IEEE (2013)

    Google Scholar 

  22. Fürnkranz, J., Petrak, J.: An evaluation of landmarking variants. In: ECML/ PKDD 2000 Workshop on Integrating Aspects of Data Mining, Decision Support and Meta-Learning, pp. 57–68 (2001)

    Google Scholar 

  23. Peterson, A.H., Martinez, T.: Estimating the potential for combining learning models. In: Proceedings of the ICML Workshop on Meta-Learning, pp. 68–75 (2005)

    Google Scholar 

  24. Lee, J.W., Giraud-Carrier, C.: A metric for unsupervised metalearning. Intell. Data Anal. 15(6), 827–841 (2011)

    Google Scholar 

  25. Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)

    MATH  MathSciNet  Google Scholar 

  26. Wang, Y., Witten, I.H.: Inducing model trees for continuous classes. In: Proceedings of the Ninth European Conference on Machine Learning, pp. 128–137 (1997)

    Google Scholar 

  27. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)

    MATH  Google Scholar 

  28. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  29. R Core Team, : R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2012). ISBN 3-900051-07-0

    Google Scholar 

  30. Robnik-Šikonja, M., Kononenko, I.: Theoretical and empirical analysis of ReliefF and RReliefF. Mach. Learn. 53(1–2), 23–69 (2003)

    Article  MATH  Google Scholar 

  31. Prodromidis, A.L., Stolfo, S.J.: Cost complexity-based pruning of ensemble classifiers. Knowl. Inf. Syst. 3(4), 449–469 (2001)

    Article  MATH  Google Scholar 

Download references

Acknowledgements

This work is partially funded by FCT/MEC through PIDDAC and ERDF/ON2 within project NORTE-07-0124-FEDER-000059, a project financed by the North Portugal Regional Operational Programme (ON.2 O Novo Norte), under the National Strategic Reference Framework (NSRF), through the European Regional Development Fund (ERDF), and by national funds, through the Portuguese funding agency, Fundação para a Ciência e a Tecnologia (FCT) within project UID/EEA/50014/2013.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fábio Pinto .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Pinto, F., Soares, C., Mendes-Moreira, J. (2015). Pruning Bagging Ensembles with Metalearning. In: Schwenker, F., Roli, F., Kittler, J. (eds) Multiple Classifier Systems. MCS 2015. Lecture Notes in Computer Science(), vol 9132. Springer, Cham. https://doi.org/10.1007/978-3-319-20248-8_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-20248-8_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-20247-1

  • Online ISBN: 978-3-319-20248-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics