Skip to main content
Log in

Better trees: an empirical study on hyperparameter tuning of classification decision tree induction algorithms

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Machine learning algorithms often contain many hyperparameters whose values affect the predictive performance of the induced models in intricate ways. Due to the high number of possibilities for these hyperparameter configurations and their complex interactions, it is common to use optimization techniques to find settings that lead to high predictive performance. However, insights into efficiently exploring this vast space of configurations and dealing with the trade-off between predictive and runtime performance remain challenging. Furthermore, there are cases where the default hyperparameters fit the suitable configuration. Additionally, for many reasons, including model validation and attendance to new legislation, there is an increasing interest in interpretable models, such as those created by the decision tree (DT) induction algorithms. This paper provides a comprehensive approach for investigating the effects of hyperparameter tuning for the two DT induction algorithms most often used, CART and C4.5. DT induction algorithms present high predictive performance and interpretable classification models, though many hyperparameters need to be adjusted. Experiments were carried out with different tuning strategies to induce models and to evaluate hyperparameters’ relevance using 94 classification datasets from OpenML. The experimental results point out that different hyperparameter profiles for the tuning of each algorithm provide statistically significant improvements in most of the datasets for CART, but only in one-third for C4.5. Although different algorithms may present different tuning scenarios, the tuning techniques generally required few evaluations to find accurate solutions. Furthermore, the best technique for all the algorithms was the Irace. Finally, we found out that tuning a specific small subset of hyperparameters is a good alternative for achieving optimal predictive performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. These techniques will be described on the following sections.

  2. The original J48 nomenclature may also be consulted at http://weka.sourceforge.net/doc.dev/weka/classifiers/trees/J48.html.

  3. https://cran.r-project.org/web/packages/caret/index.html.

  4. Area under the ROC curve.

  5. http://www.cs.ubc.ca/labs/beta/Projects/autoweka/.

  6. http://scikit-learn.org/.

  7. https://github.com/automl/auto-sklearn.

  8. Since the stochastic nature of the often used tuning algorithms, experimenting with different seeds (for random generator) is desirable.

  9. For a complete survey on hyperparameter tuning techniques and perspectives, please, consult Bischl et al. (2023).

  10. http://www.cs.waikato.ac.nz/ml/weka/.

  11. http://weka.sourceforge.net/doc.dev/weka/classifiers/trees/J48.html.

  12. http://www.openml.org/.

  13. Initially, there were 100 datasets, but 6 of them spent too much time to finish their tuning jobs. They consumed over 1000 h when we proceeded with their interruption.

  14. https://github.com/mlr-org/mlr.

  15. https://github.com/luca-scr/GA.

  16. https://cran.r-project.org/web/packages/pso/index.html.

  17. https://github.com/yasserglez/copulaedas.

  18. https://cran.r-project.org/web/packages/RWeka/index.html.

  19. https://cran.r-project.org/web/packages/rpart/index.html.

  20. https://github.com/mlr-org/mlrMBO.

  21. https://cran.r-project.org/web/packages/randomForest/index.html.

  22. http://iridia.ulb.ac.be/irace/.

  23. The budget size choice is discussed with more details in Sect. 7.

  24. The population size = 10 might be small initially, but it proves to be enough to provide good and accurate results as empirically evaluated in Mantovani et al. (2016).

  25. https://github.com/automl/fanova.

  26. These additional datasets are indicated in Appendix 2.

  27. A complete list of the pymfe available meta-features can be found here: https://pymfe.readthedocs.io/en/latest/auto_pages/meta_features_description.html.

  28. The BAC measure was preferred at the tuning level because data collection contains binary and multiclass classification problems.

  29. http://www.cs.ubc.ca/labs/beta/Projects/autoweka/.

  30. https://github.com/automl/auto-sklearn.

References

  • Abe S (2005) Support vector machines for pattern classification. Springer, London

    Google Scholar 

  • Alcobaça E, Siqueira F, Rivolli A et al (2020) MFE: towards reproducible meta-feature extraction. J Mach Learn Res 21:111:1-111:5

    Google Scholar 

  • Ali S, Smith-Miles KA (2006) A meta-learning approach to automatic kernel selection for support vector machines. Neurocomputing 70(13):173–186

    Article  Google Scholar 

  • Andradottir S (2015) A review of random search methods. In: Fu MC (ed) Handbook of simulation optimization, international series in operations research & management science, vol 216. Springer, New York, pp 277–292

    Chapter  Google Scholar 

  • Bache K, Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml

  • Bardenet R, Brendel M, Kégl B et al (2013) Collaborative hyperparameter tuning. In: Dasgupta S, Mcallester D (eds) Proceedings of the 30th international conference on machine learning (ICML-13), vol 28. JMLR workshop and conference proceedings, pp 199–207

  • Barella VH, Garcia LPF, de Souto MCP et al (2021) Assessing the data complexity of imbalanced datasets. Inf Sci 553:83–109. https://doi.org/10.1016/j.ins.2020.12.006

    Article  MathSciNet  Google Scholar 

  • Barros R, Basgalupp M, de Carvalho A et al (2012) A survey of evolutionary algorithms for decision-tree induction. IEEE Trans Syst Man Cybern C Appl Rev 42(3):291–312

    Article  Google Scholar 

  • Barros RC, de Carvalho ACPLF, Freitas AA (2015) Automatic design of Decision-Tree induction algorithms. Springer Briefs in computer science. Springer, Berlin. https://doi.org/10.1007/978-3-319-14231-9

  • Bartz E, Zaefferer M, Mersmann O et al (2021) Experimental investigation and evaluation of model-based hyperparameter optimization. CoRR arXiv:abs/2107.08761

  • Ben-Hur A, Weston J (2010) A user’s guide to support vector machines. In: Data mining techniques for the life sciences, methods in molecular biology, vol 609. Humana Press, pp 223–239

  • Bendtsen C (2012) pso: Particle Swarm Optimization. https://CRAN.R-project.org/package=pso, r package version 1.0.3

  • Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13:281–305

    MathSciNet  Google Scholar 

  • Bergstra J, Yamins D, Cox DD (2013) Making a science of model search: hyperparameter optimization in hundreds of dimensions for vision architectures. In: Proceedings of the 30th international conference on machine learning, pp 1–9

  • Bergstra JS, Bardenet R, Bengio Y et al (2011) Algorithms for hyper-parameter optimization. In: Shawe-Taylor J, Zemel RS, Bartlett PL, et al (eds) Advances in neural information processing systems 24. Curran Associates, Inc., pp 2546–2554

  • Bermúdez-Chacón R, Gonnet GH, Smith K (2015) Automatic problem-specific hyperparameter optimization and model selection for supervised machine learning: Technical Report. Tech. rep, Zürich

  • Birattari M, Yuan Z, Balaprakash P et al (2010) F-race and iterated f-race: an overview. Springer, Berlin, pp 311–336. https://doi.org/10.1007/978-3-642-02538-9_13

  • Bischl B, Lang M, Kotthoff L et al (2016) mlr: machine learning in r. J Mach Learn Res 17(170):1–5

    MathSciNet  Google Scholar 

  • Bischl B, Binder M, Lang M et al (2023) Hyperparameter optimization: foundations, algorithms, best practices and open challenges. https://wires.onlinelibrary.wiley.com/doi/10.1002/widm.1484

  • Blanco-Justicia A, Domingo-Ferrer J (2019) Machine learning explainability through comprehensible decision trees. In: Machine learning and knowledge extraction: third IFIP TC 5, TC 12, WG 8.4, WG 8.9, WG 12.9 international cross-domain conference, CD-MAKE 2019, Canterbury, UK, August 26–29, 2019, Proceedings. Springer, Berlin, pp 15–26. https://doi.org/10.1007/978-3-030-29726-8_2

  • Blanco-Justicia A, Domingo-Ferrer J, Martínez S et al (2020) Machine learning explainability via microaggregation and shallow decision trees. Knowl Based Syst 194(105):532. https://doi.org/10.1016/j.knosys.2020.105532

    Article  Google Scholar 

  • Brazdil P, Giraud-Carrier C, Soares C et al (2009) Metalearning: applications to data mining, 1st edn. Springer, Berlin

    Book  Google Scholar 

  • Breiman L, Friedman J, Olshen R et al (1984) Classification and regression trees. Chapman & Hall (Wadsworth, Inc.), London

    Google Scholar 

  • Brodersen KH, Ong CS, Stephan KE et al (2010) The balanced accuracy and its posterior distribution. In: Proceedings of the 2010 20th international conference on pattern recognition. IEEE Computer Society, pp 3121–3124

  • Cawley GC, Talbot NLC (2010) On over-fitting in model selection and subsequent selection bias in performance evaluation. J Mach Learn Res 11:2079–2107

    MathSciNet  Google Scholar 

  • Clerc M (2012) Standard particle swarm optimization

  • Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  Google Scholar 

  • Eggensperger K, Hutter F, Hoos HH et al (2015) Efficient benchmarking of hyperparameter optimizers via surrogates. In: Proceedings of the twenty-ninth AAAI conference on artificial intelligence. AAAI Press, AAAI’15, pp 1114–1120. http://dl.acm.org/citation.cfm?id=2887007.2887162

  • Eitrich T, Lang B (2006) Efficient optimization of support vector machine learning parameters for unbalanced datasets. J Comp Appl Math 196(2):425–436

    Article  MathSciNet  Google Scholar 

  • Esposito F, Malerba D, Semeraro G et al (1999) The effects of pruning methods on the predictive accuracy of induced decision trees. Appl Stoch Models Bus Ind 15:277–299

    Article  Google Scholar 

  • European Commission (2016) Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation) (Text with EEA relevance). https://eur-lex.europa.eu/eli/reg/2016/679/oj

  • Falkner S, Klein A, Hutter F (2018) BOHB: robust and efficient hyperparameter optimization at scale. In: Dy J, Krause A (eds) Proceedings of the 35th international conference on Machine Learning, Proceedings of Machine Learning Research, vol 80. PMLR, pp 1437–1446

  • Fernández-Delgado M, Cernadas E, Barro S et al (2014) Do we need hundreds of classifiers to solve real world classification problems? J Mach Learn Res 15:3133–3181

    MathSciNet  Google Scholar 

  • Feurer M, Klein A, Eggensperger K et al (2015a) Efficient and robust automated machine learning. In: Cortes C, Lawrence ND, Lee DD, et al (eds) Advances in neural information processing systems 28. Curran Associates, Inc., pp 2944–2952

  • Feurer M, Springenberg JT, Hutter F (2015b) Initializing Bayesian hyperparameter optimization via meta-learning. In: Proceedings of the twenty-ninth AAAI conference on artificial intelligence, AAAI’15. AAAI Press, pp 1128–1135. http://dl.acm.org/citation.cfm?id=2887007.2887164

  • Feurer M, Eggensperger K, Falkner S et al (2020) Auto-sklearn 2.0: hands-free automl via meta-learning. arXiv:2007.04074 [csLG]

  • Garcia LPF, Lehmann J, de Carvalho ACPLF et al (2019) New label noise injection methods for the evaluation of noise filters. Knowl Based Syst 163:693–704. https://doi.org/10.1016/j.knosys.2018.09.031

    Article  Google Scholar 

  • Gascón-Moreno J, Salcedo-Sanz S, Ortiz-García EG et al (2011) A binary-encoded tabu-list genetic algorithm for fast support vector regression hyper-parameters tuning. In: International conference on intelligent systems design and applications, pp 1253–1257

  • Gijsbers P, Vanschoren J (2021) Gama: a general automated machine learning assistant. In: Dong Y, Ifrim G, Mladenić D et al (eds) Machine learning and knowledge discovery in databases. Applied data science and demo track. Springer, Cham, pp 560–564

    Chapter  Google Scholar 

  • Goldberg D (1989) Genetic algorithms in search, optimization and machine learning. Addison Wesley, London

    Google Scholar 

  • Gomes TAF, Prudêncio RBC, Soares C et al (2012) Combining meta-learning and search techniques to select parameters for support vector machines. Neurocomputing 75(1):3–13

    Article  Google Scholar 

  • Gonzalez-Fernandez Y, Soto M (2014) copulaedas: an R package for estimation of distribution algorithms based on copulas. J Stat Softw 58(9):1–34

    Article  Google Scholar 

  • Hauschild M, Pelikan M (2011) An introduction and survey of estimation of distribution algorithms. Swarm Evol Comput 1(3):111–128

    Article  Google Scholar 

  • Haykin S (2007) Neural networks: a comprehensive foundation, 3rd edn. Prentice-Hall, Upper Saddle River

    Google Scholar 

  • Hornik K, Buchta C, Zeileis A (2009) Open-source machine learning: R meets Weka. Comput Stat 24(2):225–232

    Article  MathSciNet  Google Scholar 

  • Hothorn T, Hornik K, Zeileis A (2006) Unbiased recursive partitioning: a conditional inference framework. J Comput Graph Stat 15(3):651–674

    Article  MathSciNet  Google Scholar 

  • Huang BF, Boutros PC (2016) The parameter sensitivity of random forests. BMC Bioinform 17(1):331. https://doi.org/10.1186/s12859-016-1228-x

    Article  Google Scholar 

  • Hutter F, Hoos H, Leyton-Brown K (2014) An efficient approach for assessing hyperparameter importance. In: Proceedings of the 31th international conference on machine learning, ICML 2014, Beijing, China, 21–26 June 2014, pp 754–762. http://jmlr.org/proceedings/papers/v32/hutter14.html

  • Jankowski D, Jackowski K (2014) Evolutionary algorithm for decision tree induction. In: Saeed K, Snášel V (eds) Computer information systems and industrial management, vol 8838. Lecture notes in computer science. Springer, Berlin, pp 23–32

    Chapter  Google Scholar 

  • Jed Wing, Weston S, Williams A et al (2016) caret: classification and regression training. https://CRAN.R-project.org/package=caret, r package version 6.0-71

  • Kanda J, de Carvalho A, Hruschka E et al (2016) Meta-learning to select the best meta-heuristic for the traveling salesman problem: a comparison of meta-features. Neurocomputing 205:393–406. https://doi.org/10.1016/j.neucom.2016.04.027

    Article  Google Scholar 

  • Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of the IEEE international conference on neural networks, Perth, Australia, pp 1942–1948

  • Kohavi R (1996) Scaling up the accuracy of Naive–Bayes classifiers: a decision-tree hybrid. In: Second international conference on knowledge discovery and data mining, pp 202–207

  • Kotthoff L, Thornton C, Hoos HH et al (2016) Auto-weka 2.0: automatic model selection and hyperparameter optimization in weka. J Mach Learn Res 17:1–5

    Google Scholar 

  • Krstajic D, Buturovic LJ, Leahy DE et al (2014) Cross-validation pitfalls when selecting and assessing regression and classification models. J Cheminform 6(1):1–15. https://doi.org/10.1186/1758-2946-6-10

    Article  Google Scholar 

  • Landwehr N, Hall M, Frank E (2005) Logistic model trees. Mach Learn 95(1–2):161–205

    Article  Google Scholar 

  • Lang M, Kotthaus H, Marwedel P et al (2015) Automatic model selection for high-dimensional survival analysis. J Stat Comput Simul 85(1):62–76. https://doi.org/10.1080/00949655.2014.929131

    Article  MathSciNet  Google Scholar 

  • Lévesque JC, Gagné C, Sabourin R (2016) Bayesian hyperparameter optimization for ensemble learning. In: Proceedings of the thirty-second conference on uncertainty in artificial intelligence. AUAI Press, Arlington, Virginia, USA, UAI’16, pp 437–446. http://dl.acm.org/citation.cfm?id=3020948.3020994

  • Li L, Jamieson K, DeSalvo G et al (2018) Hyperband: a novel bandit-based approach to hyperparameter optimization. J Mach Learn Res 18(185):1–52

    MathSciNet  Google Scholar 

  • Liaw A, Wiener M (2002) Classification and regression by randomforest. R News 2(3):18–22

    Google Scholar 

  • Lin SW, Chen SC (2012) Parameter determination and feature selection for c4.5 algorithm using scatter search approach. Soft Comput 16(1):63–75. https://doi.org/10.1007/s00500-011-0734-z

    Article  Google Scholar 

  • Loh WY (2014) Fifty years of classification and regression trees. Int Stat Rev 82(3):329–348

    Article  MathSciNet  Google Scholar 

  • López-Ibáñez M, Dubois-Lacoste J, Cáceres LP et al (2016) The irace package: iterated racing for automatic algorithm configuration. Oper Res Perspect 3:43–58. https://doi.org/10.1016/j.orp.2016.09.002

    Article  MathSciNet  Google Scholar 

  • Ma J (2012) Parameter tuning using Gaussian processes. Master’s thesis, University of Waikato, New Zealand

  • Mantovani RG, Horváth T, Cerri R et al (2016) Hyper-parameter tuning of a decision tree induction algorithm. In: 5th Brazilian conference on intelligent systems, BRACIS 2016, Recife, Brazil, October 9–12, 2016. IEEE Computer Society, pp 37–42. https://doi.org/10.1109/BRACIS.2016.018

  • Mantovani RG, Rossi AL, Alcobaça E et al (2019) A meta-learning recommender system for hyperparameter tuning: predicting when tuning improves SVM classifiers. Inf Sci 501:193–221. https://doi.org/10.1016/j.ins.2019.06.005

    Article  Google Scholar 

  • Massimo CM, Navarin N, Sperduti A (2016) Hyper-parameter tuning for graph kernels via multiple kernel learning. Springer, Cham, pp 214–223. https://doi.org/10.1007/978-3-319-46672-9_25

  • Mills KL, Filliben JJ, Haines AL (2015) Determining relative importance and effective settings for genetic algorithm control parameters. Evol Comput 23(2):309–342. https://doi.org/10.1162/EVCO_a_00137

    Article  Google Scholar 

  • Miranda P, Silva R, Prudêncio R (2014) Fine-tuning of support vector machine parameters using racing algorithms. In: Proceedings of the 22nd European symposium on artificial neural networks, computational intelligence and machine learning, ESANN 2014, pp 325–330

  • Molina MM, Luna JM, Romero C et al (2012) Meta-learning approach for automatic parameter tuning: a case study with educational datasets. In: Proceedings of the 5th international conference on educational data mining, EDM 2012, pp 180–183

  • Nakamura M, Otsuka A, Kimura H (2014) Automatic selection of classification algorithms for non-experts using meta-features. China-USA Bus Rev 13(3):199–205

    Google Scholar 

  • Padierna LC, Carpio M, Rojas A et al (2017) Hyper-parameter tuning for support vector machines by estimation of distribution algorithms. Springer, Cham, pp 787–800

    Google Scholar 

  • Pérez Cáceres L, López-Ibáñez M, Stützle T (2014) An analysis of parameters of irace. Springer, Berlin, pp 37–48. https://doi.org/10.1007/978-3-662-44320-0_4

  • Pilát M, Neruda R (2013) Multi-objectivization and surrogate modelling for neural network hyper-parameters tuning. Springer, Berlin, pp 61–66. https://doi.org/10.1007/978-3-642-39678-6_11

  • Podgorelec V, Karakatic S, Barros RC et al (2015) Evolving balanced decision trees with a multi-population genetic algorithm. In: IEEE congress on evolutionary computation, CEC 2015, Sendai, Japan, May 25–28, 2015. IEEE, pp 54–61. https://doi.org/10.1109/CEC.2015.7256874

  • Probst P, Boulesteix A, Bischl B (2019) Tunability: importance of hyperparameters of machine learning algorithms. J Mach Learn Res 20:53:1-53:32

    MathSciNet  Google Scholar 

  • Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, San Francisco

  • Reif M, Shafait F, Dengel A (2011) Prediction of classifier training time including parameter optimization. In: Bach J, Edelkamp S (eds) KI 2011: advances in artificial intelligence, vol 7006. Lecture notes in computer science. Springer, Berlin, pp 260–271

    Google Scholar 

  • Reif M, Shafait F, Dengel A (2012) Meta-learning for evolutionary parameter optimization of classifiers. Mach Learn 87:357–380

    Article  MathSciNet  Google Scholar 

  • Reif M, Shafait F, Goldstein M et al (2014) Automatic classifier selection for non-experts. Pattern Anal Appl 17(1):83–96

    Article  MathSciNet  Google Scholar 

  • Ribeiro MT, Singh S, Guestrin C (2016) Model-agnostic interpretability of machine learning. arXiv:1606.05386

  • Ridd P, Giraud-Carrier C (2014) Using metalearning to predict when parameter optimization is likely to improve classification accuracy. In: Vanschoren J, Brazdil P, Soares C et al (eds) Meta-learning and algorithm selection workshop at ECAI 2014, pp 18–23

  • Rokach L, Maimon O (2014) Data mining with decision trees: theory and applications, 2nd edn. World Scientific, River Edge

    Book  Google Scholar 

  • Sabharwal A, Samulowitz H, Tesauro G (2016) Selecting near-optimal learners via incremental data allocation. In: Proceedings of the thirtieth AAAI conference on artificial intelligence. AAAI Press, AAAI’16, pp 2007–2015. http://dl.acm.org/citation.cfm?id=3016100.3016179

  • Sanders S, Giraud-Carrier CG (2017) Informing the use of hyperparameter optimization through metalearning. In: 2017 IEEE International conference on data mining, ICDM 2017, New Orleans, LA, USA, November 18–21, 2017, pp 1051–1056

  • Schauerhuber M, Zeileis A, Meyer D et al (2008) Benchmarking open-source tree learners in R/RWeka. Springer, Berlin, pp 389–396. https://doi.org/10.1007/978-3-540-78246-9_46

  • Scrucca L (2013) Ga: a package for genetic algorithms in r. J Stat Softw 53(1):1–37. https://doi.org/10.18637/jss.v053.i04

  • Simon D (2013) Evolutionary optimization algorithms, 1st edn. Wiley, New York

    Google Scholar 

  • Snoek J, Larochelle H, Adams RP (2012) Practical Bayesian optimization of machine learning algorithms. In: Pereira F, Burges C, Bottou L et al (eds) Advances in neural information processing systems, vol 25. Curran Associates, Inc., pp 2951–2959

  • Stiglic G, Kocbek S, Pernek I et al (2012) Comprehensive decision tree models in bioinformatics. PLoS ONE 7(3):1–13. https://doi.org/10.1371/journal.pone.0033812

    Article  Google Scholar 

  • Sun Q, Pfahringer B (2013) Pairwise meta-rules for better meta-learning-based algorithm ranking. Mach Learn 93(1):141–161. https://doi.org/10.1007/s10994-013-5387-y

    Article  MathSciNet  Google Scholar 

  • Sureka A, Indukuri KV (2008) Using genetic algorithms for parameter optimization in building predictive data mining models. Springer, Berlin, pp 260–271. https://doi.org/10.1007/978-3-540-88192-6_25

  • Tan PN, Steinbach M, Kumar V (2005) Introduction to data mining, 1st edn. Addison-Wesley Longman Publishing Co., Inc, Boston

  • Tantithamthavorn C, McIntosh S, Hassan AE et al (2016) Automated parameter optimization of classification techniques for defect prediction models. In: Proceedings of the 38th international conference on software engineering. ACM, New York, NY, USA, ICSE’16, pp 321–332. https://doi.org/10.1145/2884781.2884857

  • Therneau T, Atkinson B, Ripley B (2015) rpart: recursive partitioning and regression trees. https://CRAN.R-project.org/package=rpart, r package version 4.1-10

  • Thornton C, Hutter F, Hoos HH et al (2013) Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms. In: Proceedings of the KDD-2013, pp 847–855

  • van Rijn JN, Hutter F (2017) An empirical study of hyperparameter importance across datasets. In: Proceedings of the international workshop on automatic selection, configuration and composition of machine learning algorithms co-located with the european conference on machine learning & principles and practice of knowledge discovery in databases, AutoML@PKDD/ECML 2017, Skopje, Macedonia, September 22, 2017, pp 91–98. http://ceur-ws.org/Vol-1998/paper_09.pdf

  • Vanschoren J, van Rijn JN, Bischl B et al (2014) Openml: networked science in machine learning. SIGKDD Explor Newsl 15(2):49–60

    Article  Google Scholar 

  • Vieira CPR, Digiampietri LA (2020) A study about explainable articial intelligence: using decision tree to explain SVM. Revista Brasileira de Computação Aplicada 12(1):113–121. https://doi.org/10.5335/rbca.v12i1.10247

    Article  Google Scholar 

  • Wainberg M, Alipanahi B, Frey BJ (2016) Are random forests truly the best classifiers? J Mach Learn Res 17(110):1–5

    MathSciNet  Google Scholar 

  • Wang L, Feng M, Zhou B et al (2015) Efficient hyper-parameter optimization for NLP applications. In: Màrquez L, Callison-Burch C, Su J et al (eds) Proceedings of the 2015 conference on empirical methods in natural language processing, EMNLP 2015, Lisbon, Portugal, September 17–21, 2015. The Association for Computational Linguistics, pp 2112–2117. http://aclweb.org/anthology/D/D15/D15-1253.pdf

  • Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco

    Google Scholar 

  • Wu X, Kumar V (2009) The top ten algorithms in data mining, 1st edn. Chapman & Hall/CRC, London

    Book  Google Scholar 

  • Yang XS, Cui Z, Xiao R et al (2013) Swarm intelligence and bio-inspired computation: theory and applications, 1st edn. Elsevier, Amsterdam

    Google Scholar 

  • Zambrano-Bigiarini M, Clerc M, Rojas R (2013) Standard particle swarm optimisation 2011 at CEC-2013: a baseline for future PSO improvements. In: Proceedings of the IEEE congress on evolutionary computation, CEC 2013, Cancun, Mexico, June 20–23, 2013. IEEE, pp 2337–2344. https://doi.org/10.1109/CEC.2013.6557848

Download references

Acknowledgements

The authors would like to thank the Coordenaçq̃o de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) for the financial support, the Brazilian National Council for Scientific and Technological Development (CNPq) for the grant #409371/2021-1 (CNPq/MCTI/FNDCT No 18/2021), and specially to the grants #2012/23114-9, #2013/07375-0 and #2015/03986-0 from São Paulo Research Foundation (FAPESP). EFOP-3.6.3-VEKOP-16-2017-00001: Talent Management in Autonomous Vehicle Control Technologies—The Project is supported by the Hungarian Government and co-financed by the European Social Fund.

Author information

Authors and Affiliations

Authors

Contributions

RGM: conception of research, supplied the acquisition of data, prepared figures and tables, analysis, interpretation of data, drafted the work, reviewed the manuscript. TH: conception of research, supplied the acquisition of data, interpretation of data, drafted the work, provided the revised article critically for important intellectual content, reviewed the manuscript. ALDR: interpretation of data, prepared figures and tables, drafted the work, provided the revised article critically for important intellectual content, reviewed the manuscript. RC: drafted the work, provided the revised article critically for important intellectual content, reviewed the manuscript. SBJ: drafted the work, provided the revised article critically for important intellectual content, reviewed the manuscript. JV: Conception of research, analysis, interpretation of data, provided the revised article critically for important intellectual content, reviewed the manuscript. ACPLFdeC: Conception of research, analysis, interpretation of data, provided the revised article critically for important intellectual content, reviewed the manuscript.

Corresponding author

Correspondence to Rafael Gomes Mantovani.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Responsible editor: Eyke Hüllermeier.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: List of abbreviations used in the paper

AI:

Artificial Intelligence

ANN:

Artificial Neural Network

AUC:

Area Under the ROC curve

AutoML:

Automated Machine Learning

BAC:

Balanced per class Accuracy

BOHB:

Bayesian Optimization with HyperBand

CART:

Classification and Regression Tree

CASH:

Combined Algorithm Selection and Hyper-parameter Optimization

CD:

Critical Difference

CTree:

Conditional Inference Trees

CV:

Cross-validation

DL:

Deep Learning

DT:

Decision Tree

EDA:

Estimation of Distribution Algorithm

GA:

Genetic Algorithm

GDPR:

General Data Protection Regulation

GP:

Gaussian Process

GS:

Grid Search

HP:

Hyperparameter

Irace:

Iterated F-race

kNN:

k-Nearest Neighbors

LMT:

Logistic Model Tree

LR:

Logistic Regression

ML:

Machine Learning

MtL:

Meta-learning

NB:

Naïve Bayes

NBTree:

Naïve-Bayes Tree

OpenML:

Open Machine Learning

PD:

Parametric Density

PS:

Pattern Search

PSO:

Particle Swarm Optimization

REP:

Reduced Error Pruning

RF:

Random Forest

RS:

Random Search

SH:

Shrinking Hypercube

SMBO:

Sequential Model-based Optimization

SS:

Scatter Search

SVM:

Support Vector Machine

UCI:

University of California Irvine

VTJ48:

Visual Tuning J48

Appendix 2: List of OpenML datasets used in experiments

This appendix presents the full table of datasets used in both tuning and meta-learning experiments performed in this paper. For each dataset it is shown: the OpenML dataset name and id, the number of attributes (D), the number of examples (N), the number of classes (C), the number of examples belonging to the majority and minority classes (nMaj, nMin), the proportion between them (P), and whether the dataset was added to the enrichment step for meta-learning (Tables 10, 11, 12, 13, 14).

Table 10 (Multi-class) classification OpenML datasets (1 to 29) used in experiments
Table 11 (Multi-class) classification OpenML datasets (30 to 67) used in experiments
Table 12 (Multi-class) classification OpenML datasets (68 to 104) used in experiments
Table 13 (Multi-class) classification OpenML datasets (105 to 141) used in experiments
Table 14 (Multi-class) classification OpenML datasets (142 to 182) used in experiments

Appendix 3: Hyperparameter distributions of the best solutions returned by the Irace tuning technique

See Figs. 15 and 16.

Fig. 15
figure 15

Distribution of the J48 hyperparameters. Default values of the numerical hyperparameters are identified by black vertical dashed lines

Fig. 16
figure 16

Distribution of the CART hyperparameters. Default values of the numerical hyperparameters are identified by black vertical dashed lines

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gomes Mantovani, R., Horváth, T., Rossi, A.L.D. et al. Better trees: an empirical study on hyperparameter tuning of classification decision tree induction algorithms. Data Min Knowl Disc (2024). https://doi.org/10.1007/s10618-024-01002-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10618-024-01002-5

Keywords

Navigation