Abstract
This chapter describes an alternative method of calculating the average entropy of the training (sub)sets resulting from splitting on an attribute, which uses frequency tables. It is shown to be equivalent to the method used in Chapter 5 but requires less computation. Two alternative attribute selection criteria, the Gini Index of Diversity and the \(\chi^{2}\) statistic, are illustrated and it is shown how they can also be calculated using a frequency table.
The important issue of inductive bias is introduced. This leads to a description of a further attribute selection criterion, Gain Ratio, which was introduced as a way of overcoming the bias of the entropy minimisation method, which is undesirable for some datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Mingers, J. (1989). An empirical comparison of pruning methods for decision tree induction. Machine Learning, 4, 227–243.
Quinlan, J. R. (1993). C4.5: programs for machine learning. San Mateo: Morgan Kaufmann.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer-Verlag London Ltd., part of Springer Nature
About this chapter
Cite this chapter
Bramer, M. (2020). Decision Tree Induction: Using Frequency Tables for Attribute Selection. In: Principles of Data Mining. Undergraduate Topics in Computer Science. Springer, London. https://doi.org/10.1007/978-1-4471-7493-6_6
Download citation
DOI: https://doi.org/10.1007/978-1-4471-7493-6_6
Published:
Publisher Name: Springer, London
Print ISBN: 978-1-4471-7492-9
Online ISBN: 978-1-4471-7493-6
eBook Packages: Computer ScienceComputer Science (R0)