Skip to main content
Log in

Optimal decision trees for categorical data via integer programming

  • S.I.: GERAD-40
  • Published:
Journal of Global Optimization Aims and scope Submit manuscript

Abstract

Decision trees have been a very popular class of predictive models for decades due to their interpretability and good performance on categorical features. However, they are not always robust and tend to overfit the data. Additionally, if allowed to grow large, they lose interpretability. In this paper, we present a mixed integer programming formulation to construct optimal decision trees of a prespecified size. We take the special structure of categorical features into account and allow combinatorial decisions (based on subsets of values of features) at each node. Our approach can also handle numerical features via thresholding. We show that very good accuracy can be achieved with small trees using moderately-sized training sets. The optimization problems we solve are tractable with modern solvers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Bennett, K.P., Blue, J.: Optimal decision trees. Technical Report 214, Rensselaer Polytechnic Institute Math Report (1996)

  2. Bennett, K.P., Blue, J.A.: A support vector machine approach to decision trees. Neural Netw. Proc. IEEE World Congr. Comput. Intell. 3, 2396–2401 (1998)

    Google Scholar 

  3. Bertsimas, D., Dunn, J.: Optimal classification trees. Mach. Learn. 106(7), 1039–1082 (2017)

    Article  MathSciNet  Google Scholar 

  4. Bertsimas, D., Shioda, R.: Classification and regression via integer optimization. Oper. Res. 55(2), 252–271 (2017)

    Article  MathSciNet  Google Scholar 

  5. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Chapman and Hall, New York (1984)

    MATH  Google Scholar 

  6. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  Google Scholar 

  7. Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27 (2011)

    Article  Google Scholar 

  8. Dash, S., Günlük, O., Wei, D.: Boolean Decision Rules via Column Generation. Advances in Neural Information Processing Systems. Montreal, Canada (2018)

    Google Scholar 

  9. FICO Explainable Machine Learning Challenge https://community.fico.com/s/explainable-machine-learning-challenge

  10. Hyafil, L., Rivest, R.L.: Constructing optimal binary decision trees is np-complete. Inform. Process. Lett. 5(1), 15–17 (1976)

    Article  MathSciNet  Google Scholar 

  11. Kotsiantis, S.B.: Decision trees: a recent overview. Artif. Intell. Rev. 39(4), 261–283 (2013)

    Article  Google Scholar 

  12. Lichman, M.: UCI machine learning repository (2013)

  13. Malioutov, D.M., Varshney, K.R.: Exact rule learning via boolean compressed sensing. In: Proceedings of the 30th International Conference on Machine Learning, volume 3, pp. 765–773 (2013)

  14. Murthy, S., Salzberg, S.: Lookahead and pathology in decision tree induction. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, volume 2, pp. 1025–1031, San Francisco, CA, USA, (1995). Morgan Kaufmann Publishers Inc

  15. Norouzi, M., Collins, M., Johnson, M.A., Fleet, D.J., Kohli, P.: Efficient non-greedy optimization of decision trees. In: Advances in Neural Information Processing Systems, pp. 1720–1728, (2015)

  16. Ross, J.: Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)

    Google Scholar 

  17. Therneau, T., Atkinson, B., Ripley, B.: rpart: Recursive partitioning and regression trees. Technical Report (2017). R package version 4.1-11

  18. Wang, T., Rudin, C.: Learning optimized or’s of and’s. Technical report, (2015). arxiv:1511.02210

  19. Wang, T., Rudin, C., Doshi-Velez, F., Liu, Y., Klampfl, E., MacNeille, P.: A Bayesian framework for learning rule sets for interpretable classification. J. Mach. Learn. Res. 18(70), 1–37 (2017)

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Katya Scheinberg.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The work of Katya Scheinberg was partially supported by NSF Grant CCF-1320137. Part of this work was performed while Katya Scheinberg was on sabbatical leave at IBM Research, Google, and University of Oxford, partially supported by the Leverhulme Trust.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Günlük, O., Kalagnanam, J., Li, M. et al. Optimal decision trees for categorical data via integer programming. J Glob Optim 81, 233–260 (2021). https://doi.org/10.1007/s10898-021-01009-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10898-021-01009-y

Keywords

Navigation