Skip to main content

Assessing Cluster Quality Using Multiple Measures - A Decision Tree Based Approach

  • Conference paper
The Next Wave in Computing, Optimization, and Decision Technologies

Part of the book series: Operations Research/Computer Science Interfaces Series ((ORCS,volume 29))

Abstract

Clustering is a popular data mining technique, with applications in many areas. Although there are many clustering algorithms, none of them is superior on all datasets. Typically these clustering algorithms while providing summary statistics on the generated set of clusters do not provide easily interpretable detailed descriptions of the set of clusters that are generated. Further for a given dataset, different algorithms may give different sets of clusters, and so it is never clear which algorithm and which parameter settings is the most appropriate. In this paper we propose the use of a decision tree (DT) based approach that involves the use of multiple performance measures for indirectly assessing cluster quality in order to determine the most appropriate set of clusters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Ankerst, M., Breunig, M., Kriegel, H.-P., and Sander, J. (1999) “OPTICS: Ordering Points To Identify the Clustering Structure”, Proceedings of ACM SIGMOD’99 International Conference on the Management of Data, pp. 49–60. Philadelphia, PA.

    Google Scholar 

  • Banfield, J. and Raftery, A. (1992) “Identifying Ice Floes in Satellite Images”, Naval Research Reviews 43, pp. 2–18.

    Google Scholar 

  • Ben-Dor, A. and Yakhini, Z. (1999) “Clustering Gene Expression Patterns”, Proceedings of the 3rd Annual International Conference on Computational Molecular Biology (RECOMB 99), pp. 11–14, Lyon, France.

    Google Scholar 

  • Bohanec, M. and Bratko, I. (1994) “Trading Accuracy for Simplicity in Decision Trees”, Machine Learning 15, pp. 223–250.

    MATH  Google Scholar 

  • Bryson, N. (1995) “A Goal Programming for Generating Priority Vectors”, Journal of the Operational Research Society 46, pp. 641–648.

    Article  MATH  Google Scholar 

  • Bryson, N., Mobolurin, A., and Ngwenyama, O. (1995) “Modelling Pairwise Comparisons on Ratio Scales”, European Journal of Operational Research 83, pp. 639–654.

    Article  MATH  Google Scholar 

  • Bryson, N. (K-M), and Joseph, A. (2000) “Generating Consensus Priority Interval Vectors For Group Decision Making In The AHP”, Journal of Multi-Criteria Decision Analysis 9:4, pp. 127–137.

    Article  MATH  Google Scholar 

  • Bezdek, J. (1981) Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York, NY.

    MATH  Google Scholar 

  • Bock, H. (1996) “Probability Models in Partitional Cluster Analysis”, Computational Statistics and Data Analysis 23, pp. 5–28.

    Article  MATH  Google Scholar 

  • Cristofor, D. and Simovici, D. (2002) “An Information-Theoretical Approach to Clustering Categorical Databases using Genetic Algorithms”, Proceedings of the SIAM DM Workshop on Clustering High Dimensional Data, pp. 37–46. Arlington, VA.

    Google Scholar 

  • Dave, R. (1992) “Generalized Fuzzy C-Shells Clustering and Detection of Circular and Elliptic Boundaries”, Pattern Recognition 25, pp. 713–722.

    Article  Google Scholar 

  • Dhillon, I. (2001) “Co-Clustering Documents and Words Using Bipartite Spectral Graph Partitioning”, Proceedings of the 7th ACM SIGKDD, pp. 269–274, San Francisco, CA.

    Google Scholar 

  • Dubes, R. (1993). “Cluster Analysis and Related Issues”, in Handbook of Pattern Recognition & Computer Vision, C. Chen, L. Pau, and P. Wang, Eds. World Scientific Publishing Co., Inc., River Edge, NJ, pp. 3–32.

    Google Scholar 

  • Fisher, D. (1987) “Knowledge Acquisition via Incremental Conceptual Clustering”, Machine Learning 2, pp. 139–172.

    Google Scholar 

  • Jain, A. and Dubes, R. (1988) Algorithms for Clustering Data. Prentice-Hall Advanced Reference Series. Prentice-Hall, Inc., Upper Saddle River, NJ.

    MATH  Google Scholar 

  • Jain, A. and Flynn, P. (1993) Three Dimensional Object Recognition Systems. Elsevier Science Inc., New York, NY.

    Google Scholar 

  • Jain, A., Murty, M. and Flynn, P. (1999) “Data Clustering: A Review”, ACM Computing Surveys 31:3, pp. 264–323.

    Article  Google Scholar 

  • Han, J. and Kamber, M. (2001) Data Mining: Concepts and Techniques, Morgan Kaufman, New York, NY.

    Google Scholar 

  • Huang, Z. (1997) “A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining”, Proceedings SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, Tech. Report 97-07, UBC, Dept. of CS.

    Google Scholar 

  • Kim, H. and Koehler, G. (1995) “Theory and Practice of Decision Tree Induction”, Omega 23:6, pp. pp. 637–652.

    Article  Google Scholar 

  • Liu, B., Yiyuan, X., and Yu, P. (2000) “Clustering through Decision Tree Construction”, Proceedings of the Ninth International Conference on Information and Knowledge Management (CIKM’00), pp. 20–29.

    Google Scholar 

  • Murphy, P., and Aha, D. (1994) UCI Repository of Machine Learning Databases. University of California, Department of Information and Computer Science

    Google Scholar 

  • Murtagh, F. (1983) “A Survey of Recent Advances in Hierarchical Clustering Algorithms which Use Cluster Centers”, Computer Journal 26, pp. 354–359.

    MATH  Google Scholar 

  • Osei-Bryson, K.-M. (2004) “Evaluation of Decision Trees: A Multi-Criteria Approach”, Computers & Operations Research 31:11, pp. 1933–1945.

    Article  MATH  Google Scholar 

  • Saaty, T. (1980) The Analytic Hierarchy Process: Planning, Priority Setting, Resource Allocation, McGraw-Hill, New York

    Google Scholar 

  • Saaty, T. (1989) “Group Decision Making and the AHP”, in B. Golden, E. Wasil, and P. Harker (Editors), The Analytic Hierarchy Process: Application and Studies, pp. 59–67.

    Google Scholar 

  • Ward, J. (1963) “Hierarchical Grouping to Optimize An Objective Function”, J. Am. Stat. Assoc. 58, pp. 236–244.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer Science+Business Media, Inc.

About this paper

Cite this paper

Osei-Bryson, KM. (2005). Assessing Cluster Quality Using Multiple Measures - A Decision Tree Based Approach. In: Golden, B., Raghavan, S., Wasil, E. (eds) The Next Wave in Computing, Optimization, and Decision Technologies. Operations Research/Computer Science Interfaces Series, vol 29. Springer, Boston, MA . https://doi.org/10.1007/0-387-23529-9_24

Download citation

  • DOI: https://doi.org/10.1007/0-387-23529-9_24

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-0-387-23528-8

  • Online ISBN: 978-0-387-23529-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics