Better trees: an empirical study on hyperparameter tuning of classification decision tree induction algorithms

Gomes Mantovani, Rafael; Horváth, Tomáš; Rossi, André L. D.; Cerri, Ricardo; Barbon Junior, Sylvio; Vanschoren, Joaquin; Carvalho, André C. P. L. F. de

doi:10.1007/s10618-024-01002-5

Better trees: an empirical study on hyperparameter tuning of classification decision tree induction algorithms

Published: 31 January 2024

(2024)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Rafael Gomes Mantovani¹,
Tomáš Horváth^2,3,
André L. D. Rossi⁴,
Ricardo Cerri⁵,
Sylvio Barbon Junior⁶,
Joaquin Vanschoren⁷ &
…
André C. P. L. F. de Carvalho⁸

317 Accesses
1 Citation
6 Altmetric
Explore all metrics

Abstract

Machine learning algorithms often contain many hyperparameters whose values affect the predictive performance of the induced models in intricate ways. Due to the high number of possibilities for these hyperparameter configurations and their complex interactions, it is common to use optimization techniques to find settings that lead to high predictive performance. However, insights into efficiently exploring this vast space of configurations and dealing with the trade-off between predictive and runtime performance remain challenging. Furthermore, there are cases where the default hyperparameters fit the suitable configuration. Additionally, for many reasons, including model validation and attendance to new legislation, there is an increasing interest in interpretable models, such as those created by the decision tree (DT) induction algorithms. This paper provides a comprehensive approach for investigating the effects of hyperparameter tuning for the two DT induction algorithms most often used, CART and C4.5. DT induction algorithms present high predictive performance and interpretable classification models, though many hyperparameters need to be adjusted. Experiments were carried out with different tuning strategies to induce models and to evaluate hyperparameters’ relevance using 94 classification datasets from OpenML. The experimental results point out that different hyperparameter profiles for the tuning of each algorithm provide statistically significant improvements in most of the datasets for CART, but only in one-third for C4.5. Although different algorithms may present different tuning scenarios, the tuning techniques generally required few evaluations to find accurate solutions. Furthermore, the best technique for all the algorithms was the Irace. Finally, we found out that tuning a specific small subset of hyperparameters is a good alternative for achieving optimal predictive performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 6

Fig. 7

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

Article 09 November 2022

A random forest guided tour

Article 19 April 2016

Notes

These techniques will be described on the following sections.
The original J48 nomenclature may also be consulted at http://weka.sourceforge.net/doc.dev/weka/classifiers/trees/J48.html.
https://cran.r-project.org/web/packages/caret/index.html.
Area under the ROC curve.
http://www.cs.ubc.ca/labs/beta/Projects/autoweka/.
http://scikit-learn.org/.
https://github.com/automl/auto-sklearn.
Since the stochastic nature of the often used tuning algorithms, experimenting with different seeds (for random generator) is desirable.
For a complete survey on hyperparameter tuning techniques and perspectives, please, consult Bischl et al. (2023).
http://www.cs.waikato.ac.nz/ml/weka/.
http://weka.sourceforge.net/doc.dev/weka/classifiers/trees/J48.html.
http://www.openml.org/.
Initially, there were 100 datasets, but 6 of them spent too much time to finish their tuning jobs. They consumed over 1000 h when we proceeded with their interruption.
https://github.com/mlr-org/mlr.
https://github.com/luca-scr/GA.
https://cran.r-project.org/web/packages/pso/index.html.
https://github.com/yasserglez/copulaedas.
https://cran.r-project.org/web/packages/RWeka/index.html.
https://cran.r-project.org/web/packages/rpart/index.html.
https://github.com/mlr-org/mlrMBO.
https://cran.r-project.org/web/packages/randomForest/index.html.
http://iridia.ulb.ac.be/irace/.
The budget size choice is discussed with more details in Sect. 7.
The population size = 10 might be small initially, but it proves to be enough to provide good and accurate results as empirically evaluated in Mantovani et al. (2016).
https://github.com/automl/fanova.
These additional datasets are indicated in Appendix 2.
A complete list of the pymfe available meta-features can be found here: https://pymfe.readthedocs.io/en/latest/auto_pages/meta_features_description.html.
The BAC measure was preferred at the tuning level because data collection contains binary and multiclass classification problems.
http://www.cs.ubc.ca/labs/beta/Projects/autoweka/.
https://github.com/automl/auto-sklearn.

References

Abe S (2005) Support vector machines for pattern classification. Springer, London
Google Scholar
Alcobaça E, Siqueira F, Rivolli A et al (2020) MFE: towards reproducible meta-feature extraction. J Mach Learn Res 21:111:1-111:5
Google Scholar
Ali S, Smith-Miles KA (2006) A meta-learning approach to automatic kernel selection for support vector machines. Neurocomputing 70(13):173–186
Article Google Scholar
Andradottir S (2015) A review of random search methods. In: Fu MC (ed) Handbook of simulation optimization, international series in operations research & management science, vol 216. Springer, New York, pp 277–292
Chapter Google Scholar
Bache K, Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml
Bardenet R, Brendel M, Kégl B et al (2013) Collaborative hyperparameter tuning. In: Dasgupta S, Mcallester D (eds) Proceedings of the 30th international conference on machine learning (ICML-13), vol 28. JMLR workshop and conference proceedings, pp 199–207
Barella VH, Garcia LPF, de Souto MCP et al (2021) Assessing the data complexity of imbalanced datasets. Inf Sci 553:83–109. https://doi.org/10.1016/j.ins.2020.12.006
Article MathSciNet Google Scholar
Barros R, Basgalupp M, de Carvalho A et al (2012) A survey of evolutionary algorithms for decision-tree induction. IEEE Trans Syst Man Cybern C Appl Rev 42(3):291–312
Article Google Scholar
Barros RC, de Carvalho ACPLF, Freitas AA (2015) Automatic design of Decision-Tree induction algorithms. Springer Briefs in computer science. Springer, Berlin. https://doi.org/10.1007/978-3-319-14231-9
Bartz E, Zaefferer M, Mersmann O et al (2021) Experimental investigation and evaluation of model-based hyperparameter optimization. CoRR arXiv:abs/2107.08761
Ben-Hur A, Weston J (2010) A user’s guide to support vector machines. In: Data mining techniques for the life sciences, methods in molecular biology, vol 609. Humana Press, pp 223–239
Bendtsen C (2012) pso: Particle Swarm Optimization. https://CRAN.R-project.org/package=pso, r package version 1.0.3
Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13:281–305
MathSciNet Google Scholar
Bergstra J, Yamins D, Cox DD (2013) Making a science of model search: hyperparameter optimization in hundreds of dimensions for vision architectures. In: Proceedings of the 30th international conference on machine learning, pp 1–9
Bergstra JS, Bardenet R, Bengio Y et al (2011) Algorithms for hyper-parameter optimization. In: Shawe-Taylor J, Zemel RS, Bartlett PL, et al (eds) Advances in neural information processing systems 24. Curran Associates, Inc., pp 2546–2554
Bermúdez-Chacón R, Gonnet GH, Smith K (2015) Automatic problem-specific hyperparameter optimization and model selection for supervised machine learning: Technical Report. Tech. rep, Zürich
Birattari M, Yuan Z, Balaprakash P et al (2010) F-race and iterated f-race: an overview. Springer, Berlin, pp 311–336. https://doi.org/10.1007/978-3-642-02538-9_13
Bischl B, Lang M, Kotthoff L et al (2016) mlr: machine learning in r. J Mach Learn Res 17(170):1–5
MathSciNet Google Scholar
Bischl B, Binder M, Lang M et al (2023) Hyperparameter optimization: foundations, algorithms, best practices and open challenges. https://wires.onlinelibrary.wiley.com/doi/10.1002/widm.1484
Blanco-Justicia A, Domingo-Ferrer J (2019) Machine learning explainability through comprehensible decision trees. In: Machine learning and knowledge extraction: third IFIP TC 5, TC 12, WG 8.4, WG 8.9, WG 12.9 international cross-domain conference, CD-MAKE 2019, Canterbury, UK, August 26–29, 2019, Proceedings. Springer, Berlin, pp 15–26. https://doi.org/10.1007/978-3-030-29726-8_2
Blanco-Justicia A, Domingo-Ferrer J, Martínez S et al (2020) Machine learning explainability via microaggregation and shallow decision trees. Knowl Based Syst 194(105):532. https://doi.org/10.1016/j.knosys.2020.105532
Article Google Scholar
Brazdil P, Giraud-Carrier C, Soares C et al (2009) Metalearning: applications to data mining, 1st edn. Springer, Berlin
Book Google Scholar
Breiman L, Friedman J, Olshen R et al (1984) Classification and regression trees. Chapman & Hall (Wadsworth, Inc.), London
Google Scholar
Brodersen KH, Ong CS, Stephan KE et al (2010) The balanced accuracy and its posterior distribution. In: Proceedings of the 2010 20th international conference on pattern recognition. IEEE Computer Society, pp 3121–3124
Cawley GC, Talbot NLC (2010) On over-fitting in model selection and subsequent selection bias in performance evaluation. J Mach Learn Res 11:2079–2107
MathSciNet Google Scholar
Clerc M (2012) Standard particle swarm optimization
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet Google Scholar
Eggensperger K, Hutter F, Hoos HH et al (2015) Efficient benchmarking of hyperparameter optimizers via surrogates. In: Proceedings of the twenty-ninth AAAI conference on artificial intelligence. AAAI Press, AAAI’15, pp 1114–1120. http://dl.acm.org/citation.cfm?id=2887007.2887162
Eitrich T, Lang B (2006) Efficient optimization of support vector machine learning parameters for unbalanced datasets. J Comp Appl Math 196(2):425–436
Article MathSciNet Google Scholar
Esposito F, Malerba D, Semeraro G et al (1999) The effects of pruning methods on the predictive accuracy of induced decision trees. Appl Stoch Models Bus Ind 15:277–299
Article Google Scholar
European Commission (2016) Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation) (Text with EEA relevance). https://eur-lex.europa.eu/eli/reg/2016/679/oj
Falkner S, Klein A, Hutter F (2018) BOHB: robust and efficient hyperparameter optimization at scale. In: Dy J, Krause A (eds) Proceedings of the 35th international conference on Machine Learning, Proceedings of Machine Learning Research, vol 80. PMLR, pp 1437–1446
Fernández-Delgado M, Cernadas E, Barro S et al (2014) Do we need hundreds of classifiers to solve real world classification problems? J Mach Learn Res 15:3133–3181
MathSciNet Google Scholar
Feurer M, Klein A, Eggensperger K et al (2015a) Efficient and robust automated machine learning. In: Cortes C, Lawrence ND, Lee DD, et al (eds) Advances in neural information processing systems 28. Curran Associates, Inc., pp 2944–2952
Feurer M, Springenberg JT, Hutter F (2015b) Initializing Bayesian hyperparameter optimization via meta-learning. In: Proceedings of the twenty-ninth AAAI conference on artificial intelligence, AAAI’15. AAAI Press, pp 1128–1135. http://dl.acm.org/citation.cfm?id=2887007.2887164
Feurer M, Eggensperger K, Falkner S et al (2020) Auto-sklearn 2.0: hands-free automl via meta-learning. arXiv:2007.04074 [csLG]
Garcia LPF, Lehmann J, de Carvalho ACPLF et al (2019) New label noise injection methods for the evaluation of noise filters. Knowl Based Syst 163:693–704. https://doi.org/10.1016/j.knosys.2018.09.031
Article Google Scholar
Gascón-Moreno J, Salcedo-Sanz S, Ortiz-García EG et al (2011) A binary-encoded tabu-list genetic algorithm for fast support vector regression hyper-parameters tuning. In: International conference on intelligent systems design and applications, pp 1253–1257
Gijsbers P, Vanschoren J (2021) Gama: a general automated machine learning assistant. In: Dong Y, Ifrim G, Mladenić D et al (eds) Machine learning and knowledge discovery in databases. Applied data science and demo track. Springer, Cham, pp 560–564
Chapter Google Scholar
Goldberg D (1989) Genetic algorithms in search, optimization and machine learning. Addison Wesley, London
Google Scholar
Gomes TAF, Prudêncio RBC, Soares C et al (2012) Combining meta-learning and search techniques to select parameters for support vector machines. Neurocomputing 75(1):3–13
Article Google Scholar
Gonzalez-Fernandez Y, Soto M (2014) copulaedas: an R package for estimation of distribution algorithms based on copulas. J Stat Softw 58(9):1–34
Article Google Scholar
Hauschild M, Pelikan M (2011) An introduction and survey of estimation of distribution algorithms. Swarm Evol Comput 1(3):111–128
Article Google Scholar
Haykin S (2007) Neural networks: a comprehensive foundation, 3rd edn. Prentice-Hall, Upper Saddle River
Google Scholar
Hornik K, Buchta C, Zeileis A (2009) Open-source machine learning: R meets Weka. Comput Stat 24(2):225–232
Article MathSciNet Google Scholar
Hothorn T, Hornik K, Zeileis A (2006) Unbiased recursive partitioning: a conditional inference framework. J Comput Graph Stat 15(3):651–674
Article MathSciNet Google Scholar
Huang BF, Boutros PC (2016) The parameter sensitivity of random forests. BMC Bioinform 17(1):331. https://doi.org/10.1186/s12859-016-1228-x
Article Google Scholar
Hutter F, Hoos H, Leyton-Brown K (2014) An efficient approach for assessing hyperparameter importance. In: Proceedings of the 31th international conference on machine learning, ICML 2014, Beijing, China, 21–26 June 2014, pp 754–762. http://jmlr.org/proceedings/papers/v32/hutter14.html
Jankowski D, Jackowski K (2014) Evolutionary algorithm for decision tree induction. In: Saeed K, Snášel V (eds) Computer information systems and industrial management, vol 8838. Lecture notes in computer science. Springer, Berlin, pp 23–32
Chapter Google Scholar
Jed Wing, Weston S, Williams A et al (2016) caret: classification and regression training. https://CRAN.R-project.org/package=caret, r package version 6.0-71
Kanda J, de Carvalho A, Hruschka E et al (2016) Meta-learning to select the best meta-heuristic for the traveling salesman problem: a comparison of meta-features. Neurocomputing 205:393–406. https://doi.org/10.1016/j.neucom.2016.04.027
Article Google Scholar
Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of the IEEE international conference on neural networks, Perth, Australia, pp 1942–1948
Kohavi R (1996) Scaling up the accuracy of Naive–Bayes classifiers: a decision-tree hybrid. In: Second international conference on knowledge discovery and data mining, pp 202–207
Kotthoff L, Thornton C, Hoos HH et al (2016) Auto-weka 2.0: automatic model selection and hyperparameter optimization in weka. J Mach Learn Res 17:1–5
Google Scholar
Krstajic D, Buturovic LJ, Leahy DE et al (2014) Cross-validation pitfalls when selecting and assessing regression and classification models. J Cheminform 6(1):1–15. https://doi.org/10.1186/1758-2946-6-10
Article Google Scholar
Landwehr N, Hall M, Frank E (2005) Logistic model trees. Mach Learn 95(1–2):161–205
Article Google Scholar
Lang M, Kotthaus H, Marwedel P et al (2015) Automatic model selection for high-dimensional survival analysis. J Stat Comput Simul 85(1):62–76. https://doi.org/10.1080/00949655.2014.929131
Article MathSciNet Google Scholar
Lévesque JC, Gagné C, Sabourin R (2016) Bayesian hyperparameter optimization for ensemble learning. In: Proceedings of the thirty-second conference on uncertainty in artificial intelligence. AUAI Press, Arlington, Virginia, USA, UAI’16, pp 437–446. http://dl.acm.org/citation.cfm?id=3020948.3020994
Li L, Jamieson K, DeSalvo G et al (2018) Hyperband: a novel bandit-based approach to hyperparameter optimization. J Mach Learn Res 18(185):1–52
MathSciNet Google Scholar
Liaw A, Wiener M (2002) Classification and regression by randomforest. R News 2(3):18–22
Google Scholar
Lin SW, Chen SC (2012) Parameter determination and feature selection for c4.5 algorithm using scatter search approach. Soft Comput 16(1):63–75. https://doi.org/10.1007/s00500-011-0734-z
Article Google Scholar
Loh WY (2014) Fifty years of classification and regression trees. Int Stat Rev 82(3):329–348
Article MathSciNet Google Scholar
López-Ibáñez M, Dubois-Lacoste J, Cáceres LP et al (2016) The irace package: iterated racing for automatic algorithm configuration. Oper Res Perspect 3:43–58. https://doi.org/10.1016/j.orp.2016.09.002
Article MathSciNet Google Scholar
Ma J (2012) Parameter tuning using Gaussian processes. Master’s thesis, University of Waikato, New Zealand
Mantovani RG, Horváth T, Cerri R et al (2016) Hyper-parameter tuning of a decision tree induction algorithm. In: 5th Brazilian conference on intelligent systems, BRACIS 2016, Recife, Brazil, October 9–12, 2016. IEEE Computer Society, pp 37–42. https://doi.org/10.1109/BRACIS.2016.018
Mantovani RG, Rossi AL, Alcobaça E et al (2019) A meta-learning recommender system for hyperparameter tuning: predicting when tuning improves SVM classifiers. Inf Sci 501:193–221. https://doi.org/10.1016/j.ins.2019.06.005
Article Google Scholar
Massimo CM, Navarin N, Sperduti A (2016) Hyper-parameter tuning for graph kernels via multiple kernel learning. Springer, Cham, pp 214–223. https://doi.org/10.1007/978-3-319-46672-9_25
Mills KL, Filliben JJ, Haines AL (2015) Determining relative importance and effective settings for genetic algorithm control parameters. Evol Comput 23(2):309–342. https://doi.org/10.1162/EVCO_a_00137
Article Google Scholar
Miranda P, Silva R, Prudêncio R (2014) Fine-tuning of support vector machine parameters using racing algorithms. In: Proceedings of the 22nd European symposium on artificial neural networks, computational intelligence and machine learning, ESANN 2014, pp 325–330
Molina MM, Luna JM, Romero C et al (2012) Meta-learning approach for automatic parameter tuning: a case study with educational datasets. In: Proceedings of the 5th international conference on educational data mining, EDM 2012, pp 180–183
Nakamura M, Otsuka A, Kimura H (2014) Automatic selection of classification algorithms for non-experts using meta-features. China-USA Bus Rev 13(3):199–205
Google Scholar
Padierna LC, Carpio M, Rojas A et al (2017) Hyper-parameter tuning for support vector machines by estimation of distribution algorithms. Springer, Cham, pp 787–800
Google Scholar
Pérez Cáceres L, López-Ibáñez M, Stützle T (2014) An analysis of parameters of irace. Springer, Berlin, pp 37–48. https://doi.org/10.1007/978-3-662-44320-0_4
Pilát M, Neruda R (2013) Multi-objectivization and surrogate modelling for neural network hyper-parameters tuning. Springer, Berlin, pp 61–66. https://doi.org/10.1007/978-3-642-39678-6_11
Podgorelec V, Karakatic S, Barros RC et al (2015) Evolving balanced decision trees with a multi-population genetic algorithm. In: IEEE congress on evolutionary computation, CEC 2015, Sendai, Japan, May 25–28, 2015. IEEE, pp 54–61. https://doi.org/10.1109/CEC.2015.7256874
Probst P, Boulesteix A, Bischl B (2019) Tunability: importance of hyperparameters of machine learning algorithms. J Mach Learn Res 20:53:1-53:32
MathSciNet Google Scholar
Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, San Francisco
Reif M, Shafait F, Dengel A (2011) Prediction of classifier training time including parameter optimization. In: Bach J, Edelkamp S (eds) KI 2011: advances in artificial intelligence, vol 7006. Lecture notes in computer science. Springer, Berlin, pp 260–271
Google Scholar
Reif M, Shafait F, Dengel A (2012) Meta-learning for evolutionary parameter optimization of classifiers. Mach Learn 87:357–380
Article MathSciNet Google Scholar
Reif M, Shafait F, Goldstein M et al (2014) Automatic classifier selection for non-experts. Pattern Anal Appl 17(1):83–96
Article MathSciNet Google Scholar
Ribeiro MT, Singh S, Guestrin C (2016) Model-agnostic interpretability of machine learning. arXiv:1606.05386
Ridd P, Giraud-Carrier C (2014) Using metalearning to predict when parameter optimization is likely to improve classification accuracy. In: Vanschoren J, Brazdil P, Soares C et al (eds) Meta-learning and algorithm selection workshop at ECAI 2014, pp 18–23
Rokach L, Maimon O (2014) Data mining with decision trees: theory and applications, 2nd edn. World Scientific, River Edge
Book Google Scholar
Sabharwal A, Samulowitz H, Tesauro G (2016) Selecting near-optimal learners via incremental data allocation. In: Proceedings of the thirtieth AAAI conference on artificial intelligence. AAAI Press, AAAI’16, pp 2007–2015. http://dl.acm.org/citation.cfm?id=3016100.3016179
Sanders S, Giraud-Carrier CG (2017) Informing the use of hyperparameter optimization through metalearning. In: 2017 IEEE International conference on data mining, ICDM 2017, New Orleans, LA, USA, November 18–21, 2017, pp 1051–1056
Schauerhuber M, Zeileis A, Meyer D et al (2008) Benchmarking open-source tree learners in R/RWeka. Springer, Berlin, pp 389–396. https://doi.org/10.1007/978-3-540-78246-9_46
Scrucca L (2013) Ga: a package for genetic algorithms in r. J Stat Softw 53(1):1–37. https://doi.org/10.18637/jss.v053.i04
Simon D (2013) Evolutionary optimization algorithms, 1st edn. Wiley, New York
Google Scholar
Snoek J, Larochelle H, Adams RP (2012) Practical Bayesian optimization of machine learning algorithms. In: Pereira F, Burges C, Bottou L et al (eds) Advances in neural information processing systems, vol 25. Curran Associates, Inc., pp 2951–2959
Stiglic G, Kocbek S, Pernek I et al (2012) Comprehensive decision tree models in bioinformatics. PLoS ONE 7(3):1–13. https://doi.org/10.1371/journal.pone.0033812
Article Google Scholar
Sun Q, Pfahringer B (2013) Pairwise meta-rules for better meta-learning-based algorithm ranking. Mach Learn 93(1):141–161. https://doi.org/10.1007/s10994-013-5387-y
Article MathSciNet Google Scholar
Sureka A, Indukuri KV (2008) Using genetic algorithms for parameter optimization in building predictive data mining models. Springer, Berlin, pp 260–271. https://doi.org/10.1007/978-3-540-88192-6_25
Tan PN, Steinbach M, Kumar V (2005) Introduction to data mining, 1st edn. Addison-Wesley Longman Publishing Co., Inc, Boston
Tantithamthavorn C, McIntosh S, Hassan AE et al (2016) Automated parameter optimization of classification techniques for defect prediction models. In: Proceedings of the 38th international conference on software engineering. ACM, New York, NY, USA, ICSE’16, pp 321–332. https://doi.org/10.1145/2884781.2884857
Therneau T, Atkinson B, Ripley B (2015) rpart: recursive partitioning and regression trees. https://CRAN.R-project.org/package=rpart, r package version 4.1-10
Thornton C, Hutter F, Hoos HH et al (2013) Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms. In: Proceedings of the KDD-2013, pp 847–855
van Rijn JN, Hutter F (2017) An empirical study of hyperparameter importance across datasets. In: Proceedings of the international workshop on automatic selection, configuration and composition of machine learning algorithms co-located with the european conference on machine learning & principles and practice of knowledge discovery in databases, AutoML@PKDD/ECML 2017, Skopje, Macedonia, September 22, 2017, pp 91–98. http://ceur-ws.org/Vol-1998/paper_09.pdf
Vanschoren J, van Rijn JN, Bischl B et al (2014) Openml: networked science in machine learning. SIGKDD Explor Newsl 15(2):49–60
Article Google Scholar
Vieira CPR, Digiampietri LA (2020) A study about explainable articial intelligence: using decision tree to explain SVM. Revista Brasileira de Computação Aplicada 12(1):113–121. https://doi.org/10.5335/rbca.v12i1.10247
Article Google Scholar
Wainberg M, Alipanahi B, Frey BJ (2016) Are random forests truly the best classifiers? J Mach Learn Res 17(110):1–5
MathSciNet Google Scholar
Wang L, Feng M, Zhou B et al (2015) Efficient hyper-parameter optimization for NLP applications. In: Màrquez L, Callison-Burch C, Su J et al (eds) Proceedings of the 2015 conference on empirical methods in natural language processing, EMNLP 2015, Lisbon, Portugal, September 17–21, 2015. The Association for Computational Linguistics, pp 2112–2117. http://aclweb.org/anthology/D/D15/D15-1253.pdf
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco
Google Scholar
Wu X, Kumar V (2009) The top ten algorithms in data mining, 1st edn. Chapman & Hall/CRC, London
Book Google Scholar
Yang XS, Cui Z, Xiao R et al (2013) Swarm intelligence and bio-inspired computation: theory and applications, 1st edn. Elsevier, Amsterdam
Google Scholar
Zambrano-Bigiarini M, Clerc M, Rojas R (2013) Standard particle swarm optimisation 2011 at CEC-2013: a baseline for future PSO improvements. In: Proceedings of the IEEE congress on evolutionary computation, CEC 2013, Cancun, Mexico, June 20–23, 2013. IEEE, pp 2337–2344. https://doi.org/10.1109/CEC.2013.6557848

Download references

Acknowledgements

The authors would like to thank the Coordenaçq̃o de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) for the financial support, the Brazilian National Council for Scientific and Technological Development (CNPq) for the grant #409371/2021-1 (CNPq/MCTI/FNDCT No 18/2021), and specially to the grants #2012/23114-9, #2013/07375-0 and #2015/03986-0 from São Paulo Research Foundation (FAPESP). EFOP-3.6.3-VEKOP-16-2017-00001: Talent Management in Autonomous Vehicle Control Technologies—The Project is supported by the Hungarian Government and co-financed by the European Social Fund.

Author information

Authors and Affiliations

Federal University of Technology - Paraná (UTFPR), Campus of Apucarana, Apucarana, PR, Brazil
Rafael Gomes Mantovani
Faculty of Science, Institute of Computer Science, Pavol Jozef Šafárik University (UPJS), Košice, Slovakia
Tomáš Horváth
Faculty of Informatics, ELTE - Eötvös Loránd University, Budapest, Hungary
Tomáš Horváth
São Paulo State University (Unesp), Campus of Itapeva, Itapeva, SP, Brazil
André L. D. Rossi
Department of Computer Science, Federal University of São Carlos (UFSCar), São Carlos, SP, Brazil
Ricardo Cerri
Department of Engineering and Architecture, University of Trieste (UniTS), Trieste, Italy
Sylvio Barbon Junior
Eindhoven University of Technology (TU/e), Eindhoven, The Netherlands
Joaquin Vanschoren
Institute of Mathematics and Computer Sciences (ICMC), University of São Paulo (USP), São Carlos, SP, Brazil
André C. P. L. F. de Carvalho

Authors

Rafael Gomes Mantovani
View author publications
You can also search for this author in PubMed Google Scholar
Tomáš Horváth
View author publications
You can also search for this author in PubMed Google Scholar
André L. D. Rossi
View author publications
You can also search for this author in PubMed Google Scholar
Ricardo Cerri
View author publications
You can also search for this author in PubMed Google Scholar
Sylvio Barbon Junior
View author publications
You can also search for this author in PubMed Google Scholar
Joaquin Vanschoren
View author publications
You can also search for this author in PubMed Google Scholar
André C. P. L. F. de Carvalho
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

RGM: conception of research, supplied the acquisition of data, prepared figures and tables, analysis, interpretation of data, drafted the work, reviewed the manuscript. TH: conception of research, supplied the acquisition of data, interpretation of data, drafted the work, provided the revised article critically for important intellectual content, reviewed the manuscript. ALDR: interpretation of data, prepared figures and tables, drafted the work, provided the revised article critically for important intellectual content, reviewed the manuscript. RC: drafted the work, provided the revised article critically for important intellectual content, reviewed the manuscript. SBJ: drafted the work, provided the revised article critically for important intellectual content, reviewed the manuscript. JV: Conception of research, analysis, interpretation of data, provided the revised article critically for important intellectual content, reviewed the manuscript. ACPLFdeC: Conception of research, analysis, interpretation of data, provided the revised article critically for important intellectual content, reviewed the manuscript.

Corresponding author

Correspondence to Rafael Gomes Mantovani.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Responsible editor: Eyke Hüllermeier.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: List of abbreviations used in the paper

AI:: Artificial Intelligence
ANN:: Artificial Neural Network
AUC:: Area Under the ROC curve
AutoML:: Automated Machine Learning
BAC:: Balanced per class Accuracy
BOHB:: Bayesian Optimization with HyperBand
CART:: Classification and Regression Tree
CASH:: Combined Algorithm Selection and Hyper-parameter Optimization
CD:: Critical Difference
CTree:: Conditional Inference Trees
CV:: Cross-validation
DL:: Deep Learning
DT:: Decision Tree
EDA:: Estimation of Distribution Algorithm
GA:: Genetic Algorithm
GDPR:: General Data Protection Regulation
GP:: Gaussian Process
GS:: Grid Search
HP:: Hyperparameter
Irace:: Iterated F-race
kNN:: k-Nearest Neighbors
LMT:: Logistic Model Tree
LR:: Logistic Regression
ML:: Machine Learning
MtL:: Meta-learning
NB:: Naïve Bayes
NBTree:: Naïve-Bayes Tree
OpenML:: Open Machine Learning
PD:: Parametric Density
PS:: Pattern Search
PSO:: Particle Swarm Optimization
REP:: Reduced Error Pruning
RF:: Random Forest
RS:: Random Search
SH:: Shrinking Hypercube
SMBO:: Sequential Model-based Optimization
SS:: Scatter Search
SVM:: Support Vector Machine
UCI:: University of California Irvine
VTJ48:: Visual Tuning J48

Appendix 2: List of OpenML datasets used in experiments

This appendix presents the full table of datasets used in both tuning and meta-learning experiments performed in this paper. For each dataset it is shown: the OpenML dataset name and id, the number of attributes (D), the number of examples (N), the number of classes (C), the number of examples belonging to the majority and minority classes (nMaj, nMin), the proportion between them (P), and whether the dataset was added to the enrichment step for meta-learning (Tables 10, 11, 12, 13, 14).

Table 10 (Multi-class) classification OpenML datasets (1 to 29) used in experiments

Full size table

Table 11 (Multi-class) classification OpenML datasets (30 to 67) used in experiments

Full size table

Table 12 (Multi-class) classification OpenML datasets (68 to 104) used in experiments

Full size table

Table 13 (Multi-class) classification OpenML datasets (105 to 141) used in experiments

Full size table

Table 14 (Multi-class) classification OpenML datasets (142 to 182) used in experiments

Full size table

Appendix 3: Hyperparameter distributions of the best solutions returned by the Irace tuning technique

See Figs. 15 and 16.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Gomes Mantovani, R., Horváth, T., Rossi, A.L.D. et al. Better trees: an empirical study on hyperparameter tuning of classification decision tree induction algorithms. Data Min Knowl Disc (2024). https://doi.org/10.1007/s10618-024-01002-5

Download citation

Received: 26 October 2023
Accepted: 01 January 2024
Published: 31 January 2024
DOI: https://doi.org/10.1007/s10618-024-01002-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Better trees: an empirical study on hyperparameter tuning of classification decision tree induction algorithms

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science