Abstract
Clustering is a popular data mining technique, with applications in many areas. Although there are many clustering algorithms, none of them is superior on all datasets. Typically these clustering algorithms while providing summary statistics on the generated set of clusters do not provide easily interpretable detailed descriptions of the set of clusters that are generated. Further for a given dataset, different algorithms may give different sets of clusters, and so it is never clear which algorithm and which parameter settings is the most appropriate. In this paper we propose the use of a decision tree (DT) based approach that involves the use of multiple performance measures for indirectly assessing cluster quality in order to determine the most appropriate set of clusters.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ankerst, M., Breunig, M., Kriegel, H.-P., and Sander, J. (1999) “OPTICS: Ordering Points To Identify the Clustering Structure”, Proceedings of ACM SIGMOD’99 International Conference on the Management of Data, pp. 49–60. Philadelphia, PA.
Banfield, J. and Raftery, A. (1992) “Identifying Ice Floes in Satellite Images”, Naval Research Reviews 43, pp. 2–18.
Ben-Dor, A. and Yakhini, Z. (1999) “Clustering Gene Expression Patterns”, Proceedings of the 3rd Annual International Conference on Computational Molecular Biology (RECOMB 99), pp. 11–14, Lyon, France.
Bohanec, M. and Bratko, I. (1994) “Trading Accuracy for Simplicity in Decision Trees”, Machine Learning 15, pp. 223–250.
Bryson, N. (1995) “A Goal Programming for Generating Priority Vectors”, Journal of the Operational Research Society 46, pp. 641–648.
Bryson, N., Mobolurin, A., and Ngwenyama, O. (1995) “Modelling Pairwise Comparisons on Ratio Scales”, European Journal of Operational Research 83, pp. 639–654.
Bryson, N. (K-M), and Joseph, A. (2000) “Generating Consensus Priority Interval Vectors For Group Decision Making In The AHP”, Journal of Multi-Criteria Decision Analysis 9:4, pp. 127–137.
Bezdek, J. (1981) Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York, NY.
Bock, H. (1996) “Probability Models in Partitional Cluster Analysis”, Computational Statistics and Data Analysis 23, pp. 5–28.
Cristofor, D. and Simovici, D. (2002) “An Information-Theoretical Approach to Clustering Categorical Databases using Genetic Algorithms”, Proceedings of the SIAM DM Workshop on Clustering High Dimensional Data, pp. 37–46. Arlington, VA.
Dave, R. (1992) “Generalized Fuzzy C-Shells Clustering and Detection of Circular and Elliptic Boundaries”, Pattern Recognition 25, pp. 713–722.
Dhillon, I. (2001) “Co-Clustering Documents and Words Using Bipartite Spectral Graph Partitioning”, Proceedings of the 7th ACM SIGKDD, pp. 269–274, San Francisco, CA.
Dubes, R. (1993). “Cluster Analysis and Related Issues”, in Handbook of Pattern Recognition & Computer Vision, C. Chen, L. Pau, and P. Wang, Eds. World Scientific Publishing Co., Inc., River Edge, NJ, pp. 3–32.
Fisher, D. (1987) “Knowledge Acquisition via Incremental Conceptual Clustering”, Machine Learning 2, pp. 139–172.
Jain, A. and Dubes, R. (1988) Algorithms for Clustering Data. Prentice-Hall Advanced Reference Series. Prentice-Hall, Inc., Upper Saddle River, NJ.
Jain, A. and Flynn, P. (1993) Three Dimensional Object Recognition Systems. Elsevier Science Inc., New York, NY.
Jain, A., Murty, M. and Flynn, P. (1999) “Data Clustering: A Review”, ACM Computing Surveys 31:3, pp. 264–323.
Han, J. and Kamber, M. (2001) Data Mining: Concepts and Techniques, Morgan Kaufman, New York, NY.
Huang, Z. (1997) “A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining”, Proceedings SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, Tech. Report 97-07, UBC, Dept. of CS.
Kim, H. and Koehler, G. (1995) “Theory and Practice of Decision Tree Induction”, Omega 23:6, pp. pp. 637–652.
Liu, B., Yiyuan, X., and Yu, P. (2000) “Clustering through Decision Tree Construction”, Proceedings of the Ninth International Conference on Information and Knowledge Management (CIKM’00), pp. 20–29.
Murphy, P., and Aha, D. (1994) UCI Repository of Machine Learning Databases. University of California, Department of Information and Computer Science
Murtagh, F. (1983) “A Survey of Recent Advances in Hierarchical Clustering Algorithms which Use Cluster Centers”, Computer Journal 26, pp. 354–359.
Osei-Bryson, K.-M. (2004) “Evaluation of Decision Trees: A Multi-Criteria Approach”, Computers & Operations Research 31:11, pp. 1933–1945.
Saaty, T. (1980) The Analytic Hierarchy Process: Planning, Priority Setting, Resource Allocation, McGraw-Hill, New York
Saaty, T. (1989) “Group Decision Making and the AHP”, in B. Golden, E. Wasil, and P. Harker (Editors), The Analytic Hierarchy Process: Application and Studies, pp. 59–67.
Ward, J. (1963) “Hierarchical Grouping to Optimize An Objective Function”, J. Am. Stat. Assoc. 58, pp. 236–244.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer Science+Business Media, Inc.
About this paper
Cite this paper
Osei-Bryson, KM. (2005). Assessing Cluster Quality Using Multiple Measures - A Decision Tree Based Approach. In: Golden, B., Raghavan, S., Wasil, E. (eds) The Next Wave in Computing, Optimization, and Decision Technologies. Operations Research/Computer Science Interfaces Series, vol 29. Springer, Boston, MA . https://doi.org/10.1007/0-387-23529-9_24
Download citation
DOI: https://doi.org/10.1007/0-387-23529-9_24
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-23528-8
Online ISBN: 978-0-387-23529-5
eBook Packages: Computer ScienceComputer Science (R0)