Abstract
Clustering has been widely used to partition data into groups so that the degree of association is high among members of the same group and low among members of different groups. Though many effective and efficient clustering algorithms have been developed and deployed, most of them still suffer from the lack of automatic or online decision for optimal number of clusters. In this paper, we define clustering gain as a measure for clustering optimality, which is based on the squared error sum as a clustering algorithm proceeds. When the measure is applied to a hierarchical clustering algorithm, an optimal number of clusters can be found. Our clustering measure shows good performance producing intuitively reasonable clustering configurations in Euclidean space according to the evidence from experimental results. Furthermore, the measure can be utilized to estimate the desired number of clusters for partitional clustering methods as well. Therefore, the clustering gain measure provides a promising technique for achieving a higher level of quality for a wide range of clustering methods.
Similar content being viewed by others
References
Gose, E., Johnsonbaugh, R. and Jost, S. (1996), Pattern Recognition & Image Analysis, Prentice Hall, Upper Saddle River, NJ.
Jain, A.K., Murty, M.N. and Flynn, P.J. (1999), Data clustering: a review, ACM Computing Surveys 31, 264–323.
Shaffer, E., Dubes, R. and Jain, A.K. (1979), Single-link characteristics of a mode-seeking algorithm, Pattern Recognition 11, 65–73.
Kittler, J. (1976), A locally sensitive method for cluster analysis, Pattern Recognition 8, 22–33.
Zahn, C.T. (1971), Graph-theoretical methods for detecting and describing gestalt clusters, IEEE Transactions on Computers 20, 68–86.
Urquhart, R. (1982), Graph theoretical clustering based on limited neighborhood sets, Pattern Recognition 15, 173–187.
Gowdar, K.C. and Krishna, G. (1978), Agglomerative clustering using the concept for multispectral data, Pattern Recognition 10, 105–112.
Anderberg, M.R. (1973), Cluster Analysis for Applications, Academic Press, New York.
Abramowtiz, M. and Stegun, I.A. (1968), Handbook of Mathematical Functions with Formulas, Graphics and Mathematical Tables, US Govt. Printing Office, Washington, D.C.
Fortier, J.J. and Solomon, H. (1966), Clustering Procedures, In Krishnaiah, P.R. (ed). Multivariate Analysis, Academic Press, New York. pp. 493–506.
Jensen, R.E. (1967), A dynamic programming algorithm for cluster analysis, Operations Research 17, 1034–1057.
Jain, A.K. and Dubes, R.C. (1988), Algorithms for Clustering Data, Prentice Hall, Englewood Cliffs, NJ.
Willet, P., (1988), Recent trends in hierarchic document clustering: a critical review, Information Processing and Management 24, 577–597.
Jancey, R.C., (1966), Multidimensional group analysis, Austral. J. Botany 14, 127–130.
MacQueen, J.B. (1967), Some methods for classification and analysis of multivariate observations. In: the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Vol.1, AD 669871, University of California Press, Berkeley, CA, pp. 281–297.
Sibson, R. (1973), SLINK: an optimally efficient algorithm for the single-link cluster method, Computer Journal 16, 30–34.
Defays, D. (1977), An efficient algorithm for a complete link method, Computer Journal 20, 364–366.
Day, W.H.E. and Edelsbrunner, H. (1984), Efficient algorithms for agglomerative hierarchical clustering methods, Journal of Classification 1, 7–24.
Ng, R.T. and Han G., J. (1994), Efficient and effective clustering methods for spatial data mining. In: Proceedings of the 20th VLDB Conference, Santiago, Chile.
Voorhees, E.M. (1986), Implementing agglomerative hierarchical clustering algorithms for use in document retrieval. In: Information Processing and Management Vol. 22, Pergamon, Oxford, pp. 465–476.
Li, X. (1990), Parallel algorithms for hierarchical clustering and cluster validity, IEEE Transactions on Pattern Analysis and Machine Intelligence, 1088–1092.
Bradley, P., Fayyad, U. and Reina, C. (1998) Scaling clusterin algorithms to large databases, Knowledge Discovery and Data Mining.
Guha, S., Rastogi, R. and Shim, K. (1998), CURE: an efficient clustering algorithm for large databases. In: ACM-SIGMOD Int. Conf. on Management of Data, Seattle, WA. 73–84.
Guha, S., Rastogi, R. and Shim, K. (1999), ROCK: a robust clustering algorithm for categorical attributes. In: the 15th Int. Conf. on Data Eng.
Ganti, V., Ramakrishnan, R. and Gehrke, J. (2000), Clustering large datasets in arbitrary metric spaces. ACM.
Charikar, M., Chekuri, C., Feder, T. and Motwani, R. (1997), Incremental clustering and dynamic information retrieval. In: STOC '97, El Paso, TX, pp. 153–180.
Dubes, R.C. (1987), How many Clusters are best? – an experiment, Pattern Recognition 20, 645–663.
Milligan, G.W. and Cooper M.C. (1985), An examination of procedures for detecting the number of clusters in a data set, Psychometrika 50, 159–179.
Tou, J.T. and Gonzalez, R.C. (1974) Pattern Recognition Principles, Addison-Wesley, Miami, FL.
Everett, B. (1975), Cluster analysis, Addison-Wesley, New York.
Boley, D.L. (1998), Principal direction divisive partitioning, Data mining and knowledge discovery 2, 325–344.
Mirkin, B. and Muchnik, I. (1998), Combinatorial Optimization in Clustering. In: Du, D.Z. and Pardalos, P.M. (eds), Handbook of Combinatorial Optimization, Kluwer Academic Publishers, Dordrecht pp. 261–329.
Karypis, G., Han E.S. and Kumar, V. (1999), CHAMELEON: a hierarchical clustering algorithm using dynamic modeling, IEEE Computer: Special Issue on Data Analysis and Mining 32, 68–75.
Duran, B.S. and Odell, P.L. (1977) Cluster Analysis: A Survey, Springer, Berlin.
Diday, E. and Simon, J.C. (1976), Clustering Analysis. In: Fu, K.S. (ed), Digital Pattern Recognition, Springer, Secaucus, NJ, pp. 47–94.
Garey, M.R. and Johnson, D.S. (1979), Computers and Intractability: a guide to the theory of NP-completeness, W.H.Freeman and Company, San Francisco, CA.
Crescenzi, P. and Kann, V. (1995), A compendium of NP optimization problems, URL site:http://www.nada.kth.se/\({{\tilde v}}\)iggo/problemlist/compendium2.
Ward, Jr. J.H. (1963), Hierarchical grouping to optimize an objective function, Journal of the American Statical Association 58, 236–244.
Forgy, E.W. (1965), Cluster analysis of multivariate data: efficiency versus interpretability of classification. In: Biometric Society Meetings, Reverside, CA, Abstract in Biometrics 21, 768.
Sebestyen, G.S. (1962), Pattern recognition by an adaptive process of sample set construction. IRE Trans. on Info. Theory IT–8.
MacQueen, J.B. (1966), Some methods for classification and analysis of multivariate observations. In: Wester Management Science Inst., University of California, pp. 96, 1966
Ball, G.H. and Hall, D.J. (1964), Some fundamental concepts and synthesis procedures for pattern recognition preprocessors. In: International Conference on Microwaves, Circuit Theory, and Information Theory.
Kaufman, L. and Rousseeuw, P.J. (1990) Finding Groups in Data: an Introduction to Clustering Analysis, Academic Press, San Diego, CA.
Rasmussen, E. (1992), Clustering Algorithms. In: Frakes, W.B. and Baeza-Yates, R. (eds), Information Retrieval: Data Structures and Algorithms, Prentice-Hall, Upper Saddle River, NJ, pp. 419–442.
Jardine, N. and Rijsbergen, C.J. (1971), The use of hierarchical clustering in information retrieval. Information Storage and Retrieval 7, 217–240.
Bennett, R.S. (1966), The intrinsic dimensionality of signal collections, IEEE Transactions on Information Theory 15, 517–525.
Zhang, T., Ramakrishnan, R. and Livny, M. (1996), BIRCH: An efficient data clustering method for very large databases, SIGMOD Rec. 25, 103–114.
Porter, M.F. (1980), An Algorithm for Suffix Stripping, Program, 14, 130–137.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Jung, Y., Park, H., Du, DZ. et al. A Decision Criterion for the Optimal Number of Clusters in Hierarchical Clustering. Journal of Global Optimization 25, 91–111 (2003). https://doi.org/10.1023/A:1021394316112
Issue Date:
DOI: https://doi.org/10.1023/A:1021394316112