Fast agglomerative hierarchical clustering algorithm using Locality-Sensitive Hashing

Koga, Hisashi; Ishibashi, Tetsuo; Watanabe, Toshinori

doi:10.1007/s10115-006-0027-5

Fast agglomerative hierarchical clustering algorithm using Locality-Sensitive Hashing

Regular Paper
Published: 21 July 2006

Volume 12, pages 25–53, (2007)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Hisashi Koga¹,
Tetsuo Ishibashi¹ &
Toshinori Watanabe¹

1465 Accesses
69 Citations
10 Altmetric
Explore all metrics

Abstract

The single linkage method is a fundamental agglomerative hierarchical clustering algorithm. This algorithm regards each point as a single cluster initially. In the agglomeration step, it connects a pair of clusters such that the distance between the nearest members is the shortest. This step is repeated until only one cluster remains. The single linkage method can efficiently detect clusters in arbitrary shapes. However, a drawback of this method is a large time complexity of O(n ²), where n represents the number of data points. This time complexity makes this method infeasible for large data. This paper proposes a fast approximation algorithm for the single linkage method. Our algorithm reduces the time complexity to O(nB) by rapidly finding the near clusters to be connected by Locality-Sensitive Hashing, a fast algorithm for the approximate nearest neighbor search. Here, B represents the maximum number of points going into a single hash entry and it practically diminishes to a small constant as compared to n for sufficiently large hash tables. Experimentally, we show that (1) the proposed algorithm obtains clustering results similar to those obtained by the single linkage method and (2) it runs faster for large data than the single linkage method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

A Density-Sensitive Hierarchical Clustering Method

Article 28 September 2018

Álvaro Martínez-Pérez

Versatile Linkage: a Family of Space-Conserving Strategies for Agglomerative Hierarchical Clustering

Article 16 July 2019

Alberto Fernández & Sergio Gómez

A Greedy Algorithm for Hierarchical Complete Linkage Clustering

References

Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high-dimensional data for data mining applications. In: Proceedings of ACM SIGMOD international conference on management of data, pp 94–105
Ankerst M, Breunig M, Kriegel H, Sander J (1999) OPTICS: Ordering points to identify the clustering structure. In: Proceedings of ACM SIGMOD international conference on management of data, pp 49–60
Barrett T, Suzek T, Troup D, Wilhite S, Ngau W, Ledoux P, Rudnev D, Lash A, Fujibuchi W, Edgar R (2005) NCBI GEO: mining millions of expression profiles—database and tools. Nucleic Acids Res 33:562–566
Article Google Scholar
Ester M, Kriegel H, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd ACM SIGKDD, pp 226–231
Gionis A, Indyk P, Motwani R (1999) Similarity search in high dimensions via hashing. In: Proceedings of the 25th VLDB conference, pp 518–529
Haveliwala TH, Gionis A, Indyk P (2000) Scalable techniques for clustering the web. In: Proceedings of the 3rd international workshop on the web and databases, pp 129–134
Hinneburg A, Keim DA (1998) An efficient approach to clustering in large multimedia databases with noise. In: Proceedings of 4th ACM SIGKDD, pp 58–65
Indyk P, Motwani R (1998) Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of 30th ACM symposium on theory of computing, pp 604–613
Jain AK (1984) Handbook of pattern recognition and image processing. Academic Press, New York
Google Scholar
Jung SY, Kim T (2001) An agglomerative hierarchical clustering using partial maximum array and incremental similarity computation method. In: Proceedings of the 2001 IEEE international conference on data mining, pp 265–272
Karypis G, Han E, Kumar V (1999) CHAMELEON: hierarchical clustering using dynamic modeling. IEEE Comput 32(8):68–75
Google Scholar
Sheikholeslami G, Chatterjee S, Zhang A (1998) WaveCluster: a multi-resolution clustering approach for very large spatial databases. In: Proceedings of the 24th VLDB conference, pp 428–439
Sibson R (1973) SLINK: an optimally efficient algorithm for the single link cluster method. Comput J 16:30–34
Article MathSciNet Google Scholar
Wang W, Yang J, Muntz R (1997) STING: a statistical information grid approach to spatial data mining. In: Proceedings of the 23rd VLDB conference, pp 186–195
Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: an efficient data clustering model for very large databases. In: Proceedings of the 1996 ACM SIGMOD international conference on management of data, pp 103–114

Download references

Author information

Authors and Affiliations

Graduate School of Information Systems, University of Electro-Communications, 1-5-1 Chofugaoka, Chofu-si, Tokyo, 182-8585, Japan
Hisashi Koga, Tetsuo Ishibashi & Toshinori Watanabe

Authors

Hisashi Koga
View author publications
You can also search for this author in PubMed Google Scholar
Tetsuo Ishibashi
View author publications
You can also search for this author in PubMed Google Scholar
Toshinori Watanabe
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hisashi Koga.

Additional information

Hisashi Koga received the M.S. and Ph.D. degree in information science in 1995 and 2002, respectively, from the University of Tokyo. From 1995 to 2003, he worked as a researcher at Fujitsu Laboratories Ltd. Since 2003, he has been a faculty member at the University of Electro-Communications, Tokyo (Japan). Currently, he is an associate professor at the Graduate School of Information Systems, University of Electro-Communications. His research interest includes various kinds of algorithms such as clustering algorithms, on-line algorithms, and algorithms in network communications.

Tetsuo Ishibashi received the M.E. degree in information systems design from the Graduate School of Information Systems at the University of Electro-Communications in 2004. Presently, he is a system engineer at Fujitsu Broad Solution & Consulting Inc.

Toshinori Watanabe received the B.E. degree in aeronautical engineering in 1971 and the D.E. degree in 1985, both from the University of Tokyo. In 1971, he worked at Hitachi as a researcher in the field of information systems design. His experience includes demand forecasting, inventory and production management, VLSI design automation, knowledge-based nonlinear optimizer, and a case-based evolutionary learning system nicknamed TAMPOPO. He also engaged in FGCS (Fifth Generation Computer System) project of Japan and developed a new hierarchical message-passing parallel cooperative VLSI layout problem solver that ran on PIM (Parallel Inference Machine) in 1991. Since 1992, he has been a professor at the Graduate School of Information Systems, University of Electro-Communications, Tokyo, Japan. His areas of interest include media analysis, learning intelligence, and the semantics of information systems. He is a member of the IEEE.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Koga, H., Ishibashi, T. & Watanabe, T. Fast agglomerative hierarchical clustering algorithm using Locality-Sensitive Hashing. Knowl Inf Syst 12, 25–53 (2007). https://doi.org/10.1007/s10115-006-0027-5

Download citation

Received: 24 March 2005
Revised: 25 February 2006
Accepted: 11 March 2006
Published: 21 July 2006
Issue Date: May 2007
DOI: https://doi.org/10.1007/s10115-006-0027-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fast agglomerative hierarchical clustering algorithm using Locality-Sensitive Hashing

Abstract

Access this article

Similar content being viewed by others

A Density-Sensitive Hierarchical Clustering Method

Versatile Linkage: a Family of Space-Conserving Strategies for Agglomerative Hierarchical Clustering

A Greedy Algorithm for Hierarchical Complete Linkage Clustering

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Fast agglomerative hierarchical clustering algorithm using Locality-Sensitive Hashing

Abstract

Access this article

Similar content being viewed by others

A Density-Sensitive Hierarchical Clustering Method

Versatile Linkage: a Family of Space-Conserving Strategies for Agglomerative Hierarchical Clustering

A Greedy Algorithm for Hierarchical Complete Linkage Clustering

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation