Learning Small Decision Trees with Large Domain

Learning Small Decision Trees with Large Domain

Eduard Eiben, Sebastian Ordyniak, Giacomo Paesani, Stefan Szeider

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence
Main Track. Pages 3184-3192. https://doi.org/10.24963/ijcai.2023/355

One favors decision trees (DTs) of the smallest size or depth to facilitate explainability and interpretability. However, learning such an optimal DT from data is well-known to be NP-hard. To overcome this complexity barrier, Ordyniak and Szeider (AAAI 21) initiated the study of optimal DT learning under the parameterized complexity perspective. They showed that solution size (i.e., number of nodes or depth of the DT) is insufficient to obtain fixed-parameter tractability (FPT). Therefore, they proposed an FPT algorithm that utilizes two auxiliary parameters: the maximum difference (as a structural property of the data set) and maximum domain size. They left it as an open question of whether bounding the maximum domain size is necessary. The main result of this paper answers this question. We present FPT algorithms for learning a smallest or lowest-depth DT from data, with the only parameters solution size and maximum difference. Thus, our algorithm is significantly more potent than the one by Szeider and Ordyniak as it can handle problem inputs with features that range over unbounded domains. We also close several gaps concerning the quality of approximation one obtains by only considering DTs based on minimum support sets.
Keywords:
Knowledge Representation and Reasoning: KRR: Computational complexity of reasoning
Machine Learning: ML: Explainable/Interpretable machine learning