Abstract
We propose a new split criterion to be used in building classification trees. This criterion called weighted accuracy or wacc has the advantage that it allows the use of divide-and-conquer algorithms when minimizing the split criterion. This is useful when more complex split families, such as intervals corners and rectangles, are considered. The split criterion is derived to imitate the Gini function as closely as possible by comparing preference regions for the two functions. The wacc function is evaluated in a large empirical comparison and is found to be competitive with the traditionally used functions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
John Bentley. Programming Pearls. ACM, 1980.
Bojan Cestnik. Hepatitis Data. Jozef Stefan Institute, Jamova 39, 61000 Ljubljana, Yugoslavia. From the UCI Machine Learning repository.
Richard S. Forsyth. Bupa liver disorders. 8 Grosvenor Avenue, Mapperley Park, Nottingham NG3 5DX, 0602-621676,1990. From the UCI Machine Learning repository.
B German. Glass data. Central Research Establishment, Home Office Forensic Science Service, Aldermaston, Reading, Berkshire RG7 4PN. From the UCI Machine Learning repository.
David J. Lubinsky. Bivariate splits and consistent split criteria in dichotomous classification trees. PhD Thesis, Department of Computer Science, Rutgers University, 1994.
John Mingers. An empirical comparison of selection measures for decision-tree induction. Machine Learning, 3: 319–342, 1989.
Robert Messenger and Lewis Mandell. A modal search technique for predictive nominal scale multivariate analysis. Journal of the American Statistical Association, 67: 768–772, 1972.
National Institute of Diabetes and Digestive and Kidney Diseases. Pima indians diabetes data. From the UCI Machine Learning repository, 1990.
J.R. Quinlan. Induction of decision trees. Machine Learning, 1 (1): 81–106, 1986.
Long Beach Robert Detrano, V.A. Medical Center and Cleveland Clinic Foundation. Heart disease database. From the UCI Machine Learning repository.
Statlib. Liver disease diagnosis. From CMU statistics library.
M. Zwitter and M. Soklic. Lymphography data. University Medical Centre, Institute of Oncology, Ljubljana, Yugoslavia. From the UCI Machine Learning repository.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1994 Springer-Verlag New York
About this paper
Cite this paper
Lubinsky, D. (1994). Algorithmic speedups in growing classification trees by using an additive split criterion. In: Cheeseman, P., Oldford, R.W. (eds) Selecting Models from Data. Lecture Notes in Statistics, vol 89. Springer, New York, NY. https://doi.org/10.1007/978-1-4612-2660-4_44
Download citation
DOI: https://doi.org/10.1007/978-1-4612-2660-4_44
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-94281-0
Online ISBN: 978-1-4612-2660-4
eBook Packages: Springer Book Archive