Abstract
We show that (1) in hierarchical clustering, many linkage functions satisfy a cluster aggregate inequality, which allows an exact O(N 2) multi-level (using mutual nearest neighbor) implementation of the standard O(N 3) agglomerative hierarchical clustering algorithm. (2) a desirable close friends cohesion of clusters can be translated into kNN consistency which is guaranteed by the multi-level algorithm; (3) For similarity-based linkage functions, the multi-level algorithm is naturally implemented as graph contraction. The effectiveness of our algorithms is demonstrated on a number of real life applications.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley, Chichester (2000)
Fung, B., Wang, K., Ester, M.: Large hierarchical document clustering using frequent itemsets. In: Proc. SIAM Data Mining Conf. (2003)
Hastie, T., Tibshirani, R., Friedman, J.: Elements of Statistical Learning. Springer, Heidelberg (2001)
Jain, A.K., Dubes, R.C.: Algorithms for clustering data. Prentice-Hall, Englewood Cliffs (1988)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Computing Surveys 31, 264–323 (1999)
Jung, S.Y., Kim, T.-S.: An agglomerative hierarchical clustering using partial maximum array and incremental similarity computation method. In: Proc. SIAM Conf. on Data Mining, pp. 265–272 (2001)
Karypis, G., Han, E.-H., Kumar, V.: Chameleon: Hierarchical clustering using dynamic modeling. IEEE Computer 32, 68–75 (1999)
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE. Trans. on Pattern Analysis and Machine Intelligence 22, 888–905 (2000)
Theodoridis, S., Koutroumbas, K.: Pattern Recognition. Academic Press, London (1999)
Voorhees, E.M.: Implementing agglomerative hierarchic clustering algorithms for use in document retrieval. Information Processing and Management 22, 465–476 (1986)
Xiong, H., Steinbach, M., Tan, P.-N., Kumar, V.: Hicap:hierarchial clustering with pattern preservation. In: Proc. SIAM Data Mining Conf., pp. 279–290 (2004)
H. Yu, J. Yang, and J. Han. Classifying large data sets using svms with hierarchical clusters. In Proc. ACM Int’l Conf. Knowledge Disc. Data Mining (KDD), pages 306–315, 2003.
T. Zhang, R. Ramakrishnan, and M. Livny. Birch: an efficient data clustering method for very large databases. Proc. ACM Int’l Conf. Management of Data (SIGMOD), pages 103–114, 1996.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ding, C., He, X. (2005). Cluster Aggregate Inequality and Multi-level Hierarchical Clustering. In: Jorge, A.M., Torgo, L., Brazdil, P., Camacho, R., Gama, J. (eds) Knowledge Discovery in Databases: PKDD 2005. PKDD 2005. Lecture Notes in Computer Science(), vol 3721. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11564126_12
Download citation
DOI: https://doi.org/10.1007/11564126_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29244-9
Online ISBN: 978-3-540-31665-7
eBook Packages: Computer ScienceComputer Science (R0)