Skip to main content
Log in

A Decision Criterion for the Optimal Number of Clusters in Hierarchical Clustering

  • Published:
Journal of Global Optimization Aims and scope Submit manuscript

Abstract

Clustering has been widely used to partition data into groups so that the degree of association is high among members of the same group and low among members of different groups. Though many effective and efficient clustering algorithms have been developed and deployed, most of them still suffer from the lack of automatic or online decision for optimal number of clusters. In this paper, we define clustering gain as a measure for clustering optimality, which is based on the squared error sum as a clustering algorithm proceeds. When the measure is applied to a hierarchical clustering algorithm, an optimal number of clusters can be found. Our clustering measure shows good performance producing intuitively reasonable clustering configurations in Euclidean space according to the evidence from experimental results. Furthermore, the measure can be utilized to estimate the desired number of clusters for partitional clustering methods as well. Therefore, the clustering gain measure provides a promising technique for achieving a higher level of quality for a wide range of clustering methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Gose, E., Johnsonbaugh, R. and Jost, S. (1996), Pattern Recognition & Image Analysis, Prentice Hall, Upper Saddle River, NJ.

    Google Scholar 

  2. Jain, A.K., Murty, M.N. and Flynn, P.J. (1999), Data clustering: a review, ACM Computing Surveys 31, 264–323.

    Google Scholar 

  3. Shaffer, E., Dubes, R. and Jain, A.K. (1979), Single-link characteristics of a mode-seeking algorithm, Pattern Recognition 11, 65–73.

    Google Scholar 

  4. Kittler, J. (1976), A locally sensitive method for cluster analysis, Pattern Recognition 8, 22–33.

    Google Scholar 

  5. Zahn, C.T. (1971), Graph-theoretical methods for detecting and describing gestalt clusters, IEEE Transactions on Computers 20, 68–86.

    Google Scholar 

  6. Urquhart, R. (1982), Graph theoretical clustering based on limited neighborhood sets, Pattern Recognition 15, 173–187.

    Google Scholar 

  7. Gowdar, K.C. and Krishna, G. (1978), Agglomerative clustering using the concept for multispectral data, Pattern Recognition 10, 105–112.

    Google Scholar 

  8. Anderberg, M.R. (1973), Cluster Analysis for Applications, Academic Press, New York.

    Google Scholar 

  9. Abramowtiz, M. and Stegun, I.A. (1968), Handbook of Mathematical Functions with Formulas, Graphics and Mathematical Tables, US Govt. Printing Office, Washington, D.C.

    Google Scholar 

  10. Fortier, J.J. and Solomon, H. (1966), Clustering Procedures, In Krishnaiah, P.R. (ed). Multivariate Analysis, Academic Press, New York. pp. 493–506.

    Google Scholar 

  11. Jensen, R.E. (1967), A dynamic programming algorithm for cluster analysis, Operations Research 17, 1034–1057.

    Google Scholar 

  12. Jain, A.K. and Dubes, R.C. (1988), Algorithms for Clustering Data, Prentice Hall, Englewood Cliffs, NJ.

    Google Scholar 

  13. Willet, P., (1988), Recent trends in hierarchic document clustering: a critical review, Information Processing and Management 24, 577–597.

    Google Scholar 

  14. Jancey, R.C., (1966), Multidimensional group analysis, Austral. J. Botany 14, 127–130.

    Google Scholar 

  15. MacQueen, J.B. (1967), Some methods for classification and analysis of multivariate observations. In: the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Vol.1, AD 669871, University of California Press, Berkeley, CA, pp. 281–297.

    Google Scholar 

  16. Sibson, R. (1973), SLINK: an optimally efficient algorithm for the single-link cluster method, Computer Journal 16, 30–34.

    Google Scholar 

  17. Defays, D. (1977), An efficient algorithm for a complete link method, Computer Journal 20, 364–366.

    Google Scholar 

  18. Day, W.H.E. and Edelsbrunner, H. (1984), Efficient algorithms for agglomerative hierarchical clustering methods, Journal of Classification 1, 7–24.

    Google Scholar 

  19. Ng, R.T. and Han G., J. (1994), Efficient and effective clustering methods for spatial data mining. In: Proceedings of the 20th VLDB Conference, Santiago, Chile.

  20. Voorhees, E.M. (1986), Implementing agglomerative hierarchical clustering algorithms for use in document retrieval. In: Information Processing and Management Vol. 22, Pergamon, Oxford, pp. 465–476.

    Google Scholar 

  21. Li, X. (1990), Parallel algorithms for hierarchical clustering and cluster validity, IEEE Transactions on Pattern Analysis and Machine Intelligence, 1088–1092.

  22. Bradley, P., Fayyad, U. and Reina, C. (1998) Scaling clusterin algorithms to large databases, Knowledge Discovery and Data Mining.

  23. Guha, S., Rastogi, R. and Shim, K. (1998), CURE: an efficient clustering algorithm for large databases. In: ACM-SIGMOD Int. Conf. on Management of Data, Seattle, WA. 73–84.

  24. Guha, S., Rastogi, R. and Shim, K. (1999), ROCK: a robust clustering algorithm for categorical attributes. In: the 15th Int. Conf. on Data Eng.

  25. Ganti, V., Ramakrishnan, R. and Gehrke, J. (2000), Clustering large datasets in arbitrary metric spaces. ACM.

  26. Charikar, M., Chekuri, C., Feder, T. and Motwani, R. (1997), Incremental clustering and dynamic information retrieval. In: STOC '97, El Paso, TX, pp. 153–180.

  27. Dubes, R.C. (1987), How many Clusters are best? – an experiment, Pattern Recognition 20, 645–663.

    Google Scholar 

  28. Milligan, G.W. and Cooper M.C. (1985), An examination of procedures for detecting the number of clusters in a data set, Psychometrika 50, 159–179.

    Google Scholar 

  29. Tou, J.T. and Gonzalez, R.C. (1974) Pattern Recognition Principles, Addison-Wesley, Miami, FL.

    Google Scholar 

  30. Everett, B. (1975), Cluster analysis, Addison-Wesley, New York.

    Google Scholar 

  31. Boley, D.L. (1998), Principal direction divisive partitioning, Data mining and knowledge discovery 2, 325–344.

    Google Scholar 

  32. Mirkin, B. and Muchnik, I. (1998), Combinatorial Optimization in Clustering. In: Du, D.Z. and Pardalos, P.M. (eds), Handbook of Combinatorial Optimization, Kluwer Academic Publishers, Dordrecht pp. 261–329.

    Google Scholar 

  33. Karypis, G., Han E.S. and Kumar, V. (1999), CHAMELEON: a hierarchical clustering algorithm using dynamic modeling, IEEE Computer: Special Issue on Data Analysis and Mining 32, 68–75.

    Google Scholar 

  34. Duran, B.S. and Odell, P.L. (1977) Cluster Analysis: A Survey, Springer, Berlin.

    Google Scholar 

  35. Diday, E. and Simon, J.C. (1976), Clustering Analysis. In: Fu, K.S. (ed), Digital Pattern Recognition, Springer, Secaucus, NJ, pp. 47–94.

    Google Scholar 

  36. Garey, M.R. and Johnson, D.S. (1979), Computers and Intractability: a guide to the theory of NP-completeness, W.H.Freeman and Company, San Francisco, CA.

    Google Scholar 

  37. Crescenzi, P. and Kann, V. (1995), A compendium of NP optimization problems, URL site:http://www.nada.kth.se/\({{\tilde v}}\)iggo/problemlist/compendium2.

  38. Ward, Jr. J.H. (1963), Hierarchical grouping to optimize an objective function, Journal of the American Statical Association 58, 236–244.

    Google Scholar 

  39. Forgy, E.W. (1965), Cluster analysis of multivariate data: efficiency versus interpretability of classification. In: Biometric Society Meetings, Reverside, CA, Abstract in Biometrics 21, 768.

    Google Scholar 

  40. Sebestyen, G.S. (1962), Pattern recognition by an adaptive process of sample set construction. IRE Trans. on Info. Theory IT–8.

  41. MacQueen, J.B. (1966), Some methods for classification and analysis of multivariate observations. In: Wester Management Science Inst., University of California, pp. 96, 1966

  42. Ball, G.H. and Hall, D.J. (1964), Some fundamental concepts and synthesis procedures for pattern recognition preprocessors. In: International Conference on Microwaves, Circuit Theory, and Information Theory.

  43. Kaufman, L. and Rousseeuw, P.J. (1990) Finding Groups in Data: an Introduction to Clustering Analysis, Academic Press, San Diego, CA.

    Google Scholar 

  44. Rasmussen, E. (1992), Clustering Algorithms. In: Frakes, W.B. and Baeza-Yates, R. (eds), Information Retrieval: Data Structures and Algorithms, Prentice-Hall, Upper Saddle River, NJ, pp. 419–442.

    Google Scholar 

  45. Jardine, N. and Rijsbergen, C.J. (1971), The use of hierarchical clustering in information retrieval. Information Storage and Retrieval 7, 217–240.

    Google Scholar 

  46. Bennett, R.S. (1966), The intrinsic dimensionality of signal collections, IEEE Transactions on Information Theory 15, 517–525.

    Google Scholar 

  47. Zhang, T., Ramakrishnan, R. and Livny, M. (1996), BIRCH: An efficient data clustering method for very large databases, SIGMOD Rec. 25, 103–114.

    Google Scholar 

  48. Porter, M.F. (1980), An Algorithm for Suffix Stripping, Program, 14, 130–137.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jung, Y., Park, H., Du, DZ. et al. A Decision Criterion for the Optimal Number of Clusters in Hierarchical Clustering. Journal of Global Optimization 25, 91–111 (2003). https://doi.org/10.1023/A:1021394316112

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1021394316112

Keywords

Navigation