Pruning Bagging Ensembles with Metalearning

Pinto, Fábio; Soares, Carlos; Mendes-Moreira, João

doi:10.1007/978-3-319-20248-8_6

Fábio Pinto¹⁶,
Carlos Soares¹⁶ &
João Mendes-Moreira¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9132))

Included in the following conference series:

International Workshop on Multiple Classifier Systems

1105 Accesses
4 Citations

Abstract

Ensemble learning algorithms often benefit from pruning strategies that allow to reduce the number of individuals models and improve performance. In this paper, we propose a Metalearning method for pruning bagging ensembles. Our proposal differs from other pruning strategies in the sense that allows to prune the ensemble before actually generating the individual models. The method consists in generating a set characteristics from the bootstrap samples and relate them with the impact of the predictive models in multiple tested combinations. We executed experiments with bagged ensembles of 20 and 100 decision trees for 53 UCI classification datasets. Results show that our method is competitive with a state-of-the-art pruning technique and bagging, while using only 25 % of the models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
For computational reasons, it was possible to apply a cross-validation methodology.

References

Brown, G., Wyatt, J., Harris, R., Yao, X.: Diversity creation methods: a survey and categorisation. Inf. Fusion 6(1), 5–20 (2005)
Article Google Scholar
Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach. Learn. 51(2), 181–207 (2003)
Article MATH Google Scholar
Zhou, Z.H., Tang, W.: Selective ensemble of decision trees. In: Wang, G., Liu, Q., Yao, Y., Skowron, A. (eds.) RSFDGrC 2003. LNCS (LNAI), vol. 2639, pp. 476–483. Springer, Heidelberg (2003)
Chapter Google Scholar
Zhang, Y., Burer, S., Street, W.N.: Ensemble pruning via semi-definite programming. J. Mach. Learn. Res. 7, 1315–1338 (2006)
MATH MathSciNet Google Scholar
Martinez-Muñoz, G., Hernández-Lobato, D., Suárez, A.: An analysis of ensemble pruning techniques based on ordered aggregation. IEEE Trans. Pattern Anal. Mach. Intell. 31(2), 245–259 (2009)
Article Google Scholar
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
MATH MathSciNet Google Scholar
Blake, C., Merz, C.J.: UCI repository of machine learning databases (1998)
Google Scholar
Margineantu, D.D., Dietterich, T.G.: Pruning adaptive boosting. In: ICML, vol. 97, pp. 211–218. Citeseer (1997)
Google Scholar
Hernández-Lobato, D., Hernández-Lobato, J.M., Ruiz-Torrubiano, R., Valle, Á.: Pruning adaptive boosting ensembles by means of a genetic algorithm. In: Corchado, E., Yin, H., Botti, V., Fyfe, C. (eds.) IDEAL 2006. LNCS, vol. 4224, pp. 322–329. Springer, Heidelberg (2006)
Chapter Google Scholar
Qian, C., Yu, Y., Zhou, Z.H.: Pareto ensemble pruning. In: AAAI Conference on Artificial Intelligence (2015)
Google Scholar
Li, N., Yu, Y., Zhou, Z.-H.: Diversity regularized ensemble pruning. In: Flach, P.A., De Bie, T., Cristianini, N. (eds.) ECML PKDD 2012, Part I. LNCS, vol. 7523, pp. 330–345. Springer, Heidelberg (2012)
Chapter Google Scholar
Hernández-Lobato, D., Martínez-Muñoz, G., Suárez, A.: How large should ensembles of classifiers be? Pattern Recogn. 46(5), 1323–1336 (2013)
Article MATH Google Scholar
Mendes-Moreira, J., Soares, C., Jorge, A.M., Sousa, J.F.D.: Ensemble approaches for regression: a survey. ACM Comput. Surv. (CSUR) 45(1), 1–40 (2012). Article No. 10
Article Google Scholar
Brazdil, P., Carrier, C.G., Soares, C., Vilalta, R.: Metalearning: Applications to Data Mining. Springer, Heidelberg (2008)
Google Scholar
Todorovski, L., Džeroski, S.: Combining classifiers with meta decision trees. Mach. Learn. 50(3), 223–249 (2003)
Article MATH Google Scholar
Peng, Y., Flach, P.A., Soares, C., Brazdil, P.: Improved Dataset Characterisation for Meta-learning. In: Lange, S., Satoh, K., Smith, C.H. (eds.) DS 2002. LNCS, vol. 2534, pp. 141–152. Springer, Heidelberg (2002)
Chapter Google Scholar
Pfahringer, B., Bensusan, H., Giraud-Carrier, C.: Tell me who can learn you and i can tell you who you are: landmarking various learning algorithms. In: Proceedings of the 17th International Conference on Machine Learning, pp. 743–750 (2000)
Google Scholar
Pinto, F., Soares, C., Mendes-Moreira, J.: An empirical methodology to analyze the behavior of bagging. In: Luo, X., Yu, J.X., Li, Z. (eds.) ADMA 2014. LNCS, vol. 8933, pp. 199–212. Springer, Heidelberg (2014)
Chapter Google Scholar
Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. (TOIS) 20(4), 422–446 (2002)
Article Google Scholar
Lin, J.: Divergence measures based on the shannon entropy. IEEE Trans. Inf. Theory 37(1), 145–151 (1991)
Article MATH Google Scholar
Saez, C., Robles, M., Garcia-Gomez, J.M.: Comparative study of probability distribution distances to define a metric for the stability of multi-source biomedical research data. In: EMBC, pp. 3226–3229. IEEE (2013)
Google Scholar
Fürnkranz, J., Petrak, J.: An evaluation of landmarking variants. In: ECML/ PKDD 2000 Workshop on Integrating Aspects of Data Mining, Decision Support and Meta-Learning, pp. 57–68 (2001)
Google Scholar
Peterson, A.H., Martinez, T.: Estimating the potential for combining learning models. In: Proceedings of the ICML Workshop on Meta-Learning, pp. 68–75 (2005)
Google Scholar
Lee, J.W., Giraud-Carrier, C.: A metric for unsupervised metalearning. Intell. Data Anal. 15(6), 827–841 (2011)
Google Scholar
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
MATH MathSciNet Google Scholar
Wang, Y., Witten, I.H.: Inducing model trees for continuous classes. In: Proceedings of the Ninth European Conference on Machine Learning, pp. 128–137 (1997)
Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
MATH Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article MATH Google Scholar
R Core Team, : R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2012). ISBN 3-900051-07-0
Google Scholar
Robnik-Šikonja, M., Kononenko, I.: Theoretical and empirical analysis of ReliefF and RReliefF. Mach. Learn. 53(1–2), 23–69 (2003)
Article MATH Google Scholar
Prodromidis, A.L., Stolfo, S.J.: Cost complexity-based pruning of ensemble classifiers. Knowl. Inf. Syst. 3(4), 449–469 (2001)
Article MATH Google Scholar

Download references

Acknowledgements

This work is partially funded by FCT/MEC through PIDDAC and ERDF/ON2 within project NORTE-07-0124-FEDER-000059, a project financed by the North Portugal Regional Operational Programme (ON.2 O Novo Norte), under the National Strategic Reference Framework (NSRF), through the European Regional Development Fund (ERDF), and by national funds, through the Portuguese funding agency, Fundação para a Ciência e a Tecnologia (FCT) within project UID/EEA/50014/2013.

Author information

Authors and Affiliations

INESC TEC/Faculdade de Engenharia, Universidade Do Porto, Rua Dr. Roberto Frias, S/n, 4200-465, Porto, Portugal
Fábio Pinto, Carlos Soares & João Mendes-Moreira

Authors

Fábio Pinto
View author publications
You can also search for this author in PubMed Google Scholar
Carlos Soares
View author publications
You can also search for this author in PubMed Google Scholar
João Mendes-Moreira
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fábio Pinto .

Editor information

Editors and Affiliations

Ulm University, Ulm, Germany
Friedhelm Schwenker
University of Cagliari, Cagliari, Italy
Fabio Roli
University of Surrey, Guildford, United Kingdom
Josef Kittler

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pinto, F., Soares, C., Mendes-Moreira, J. (2015). Pruning Bagging Ensembles with Metalearning. In: Schwenker, F., Roli, F., Kittler, J. (eds) Multiple Classifier Systems. MCS 2015. Lecture Notes in Computer Science(), vol 9132. Springer, Cham. https://doi.org/10.1007/978-3-319-20248-8_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-20248-8_6
Published: 03 June 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-20247-1
Online ISBN: 978-3-319-20248-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics