skip to main content
article
Free Access

Fast hierarchical clustering and other applications of dynamic closest pairs

Published:31 December 2000Publication History
Skip Abstract Section

Abstract

We develop data structures for dynamic closest pair problems with arbitrary distance functions, that do not necessarily come from any geometric structure on the objects. Based on a technique previously used by the author for Euclidean closest pairs, we show how to insert and delete objects from an <i>n</i>-object set, maintaining the closest pair, in <i>O</i>(<i>n</i> log<sup>2</sup> <i>n</i>) time per update and <i>O</i>(<i>n</i>) space. With quadratic space, we can instead use a quadtree-like structure to achieve an optimal time bound, <i>O</i>(<i>n</i>) per update. We apply these data structures to hierarchical clustering, greedy matching, and TSP heuristics, and discuss other potential applications in machine learning, Gröbner bases, and local improvement algorithms for partition and placement problems. Experiments show our new methods to be faster in practice than previously used heuristics.

Skip Supplemental Material Section

Supplemental Material

References

  1. ADAMS, W. W. AND LOUSTAUNAU, P. 1994. An Introduction to Gröbner Bases. Number 3 in Graduate Studes in Mathematics. AMS.Google ScholarGoogle Scholar
  2. AGARWALA, R., BAFNA, V., FARACH, M., NARAYANAN, B., PATERSON, M. S., AND THORUP, M. 1998. On the approximability of numerical taxonomy (fitting distances by tree metrics). SIAM J. Computing 28, 3, 1073-1085. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. AICHHOLZER, O. 1997. Combinatorial &amp; Computational Properties of the Hypercube - New Results on Covering, Slicing, Clustering and Searching on the Hypercube. Ph. D. thesis, Tech. Univ. Graz.Google ScholarGoogle Scholar
  4. ANDERBERG, M. R. 1973. Cluster Analysis for Applications. Number 19 in Probability and Mathematical Statistics. Academic Press, New York.Google ScholarGoogle Scholar
  5. BALL, G. H. AND HALL, D.J. 1965. ISODATA, a novel method of data analysis and pattern classification. Technical report, Stanford Research Inst.Google ScholarGoogle Scholar
  6. BENTLEY, J.L. 1990. Experiments on traveling salesman heuristics. In Proc. 1st Symp. Discrete Algorithms (January 1990), pp. 91-99. ACM and SIAM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. BUCHBERGER, B. 1987. Applications of Gröbner bases in non-linear computational geometry. In R. JANSSEN Ed., Proc. Int. Symp. Trends in Computer Algebra, Number 296 in Lecture Notes in Computer Science (Berlin, May 1987), pp. 52-66. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. CHENG, X. AND WALLACE, J. M. 1993. Cluster analysis of the northern hemisphere wintertime 500-hPa height field: spatial patterns. J. Atmospheric Sciences 50, 16 (August), 2674-2696.Google ScholarGoogle ScholarCross RefCross Ref
  9. CLEGG, M., EDMONDS, J., AND IMPAGLIAZZO, R. 1996. Using the Groebner basis algorithm to find proofs of unsatisfiability. In Proc. 28th Syrup. Theory of Computing (May 1996), pp. 174-183. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. CORPET, F. 1988. Multiple sequence alignment with hierarchical clustering. Nucleic Acids Research 16, 22, 10881-10890.Google ScholarGoogle ScholarCross RefCross Ref
  11. CZAPOR, S. 1991. A heuristic selection strategy for lexicographic Gröbner bases? In Proc. Int. Symp. Symbolic &amp; Algebraic Computation (July 1991), pp. 39-48. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. DOBKIN, D. AND SURI, S. 1991. Maintenance of geometric extrema. J. ACM 38, 275-298. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. DURAN, B. S. AND ODELL, P.L. 1974. Cluster Analysis: A Survey. Number 100 in Lecture Notes in Economics and Mathematical Systems. Springer-Verlag, Berlin.Google ScholarGoogle Scholar
  14. EPPSTEIN, D. 1995. Dynamic Euclidean minimum spanning trees and extrema of binary functions. Discrete &amp; Computational Geometry 13, 1 (January), 111-122.Google ScholarGoogle Scholar
  15. EPPSTEIN, D. AND ERICKSON, J. 1999. Raising roofs, crashing cycles, and playing pool: applications of a data structure for finding pairwise interactions. Discrete &amp; Computational Geometry 22, 4, 569-592.Google ScholarGoogle Scholar
  16. FELSENSTEIN, J. 1995. PHYLIP (Phylogeny Inference Package) version 3.572c. Distributed by the author.Google ScholarGoogle Scholar
  17. FRIEZE, A., MCDIARMID, C., AND REED, B. 1990. Greedy matching on the line. SIAM J. Computing 19, 4 (August), 666-672. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. GIOVINI, A., MORA, T., NIESI, G., ROBBIANO, L., AND TRAVERSO, C. 1991. 'One sugar cube, please' or selection strategies in the Buchberger algorithm. In Proc. Int. Symp. Symbolic &amp; Algebraic Computation (July 1991), pp. 49-54. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. GOLIN, M., RAMAN, R., SCHWARZ, C., AND SMID, M. 1998. Randomized data structures for the dynamic closest-pair problem. SIAM J. Computing 27, 4 (August), 1036-1072. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. GOTOH, O. 1994. Further improvements in methods of group-to-group sequence alignment with generalized profile operations. CABIOS 10, 4, 379-387.Google ScholarGoogle Scholar
  21. HARTIGAN, J.A. 1975. Clustering Algorithms. John Wiley &amp; Sons, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. KRZNARIC, D. AND LEVCOPOULOS, C. 1998. Fast algorithms for complete linkage clustering. Discrete &amp; Computational Geometry 19, 1 (January), 131-145.Google ScholarGoogle Scholar
  23. MATIAS, Y. 1993. Semi-dynamic closest-pair algorithms. In Proc. 5th Canad. Conf. Computational Geometry (August 1993), pp. 264-271.Google ScholarGoogle Scholar
  24. MICHENER, C. D. AND SOKAL, R.R. 1957. A quantitative approach to a problem in classification. Evolution 11, 130-162.Google ScholarGoogle ScholarCross RefCross Ref
  25. PAZZANI, M.J. 1997. Constructive induction of Cartesian product attributes. Manuscript.Google ScholarGoogle Scholar
  26. REINGOLD, E. M. AND TARJAN, R.E. 1981. On a greedy heuristic for complete matching. SIAM J. Computing 10, 4 (November), 676-681.Google ScholarGoogle ScholarCross RefCross Ref
  27. ROSENKRANTZ, D. H., STEARNS, R. E., AND LEWIS, P. M., II. 1977. An analysis of several heuristics for the traveling salesman problem. SIAM J. Computing 6, 3 (September), 563- 581.Google ScholarGoogle ScholarCross RefCross Ref
  28. SCHWARZ, C., SMID, M., AND SNOEYINK, J. 1994. An optimal algorithm for the on-line closest pair problem. Algorithmica 12, 1 (July), 18-29.Google ScholarGoogle ScholarCross RefCross Ref
  29. SMID, M. 1992. Maintaining the minimal distance of a point set in polylogarithmic time. Discrete &amp; Computational Geometry 7, 415-431.Google ScholarGoogle Scholar
  30. STURMFELS, B. 1996. Gröbner Bases and Convex Polytopes. Number 8 in University Lecture Ser. AMS.Google ScholarGoogle Scholar
  31. SUPOWIT, K.J. 1990. New techniques for some dynamic closest-point and farthest-point problems. In Proc. 1st Symp. Discrete Algorithms (1990), pp. 84-90. ACM and SIAM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. THOMPSON, J. D., HIGGINS, D. G., AND GIBSON, T. J. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice. Nucleic Acids Research 22, 4673- 4680.Google ScholarGoogle Scholar
  33. WARD, J. 1963. Hierarchical grouping to optimize an objective function. J. Amer. Statistical Assoc. 58, 301, 236-244.Google ScholarGoogle ScholarCross RefCross Ref
  34. YIANILOS, P.N. 1993. Data structures and algorithms for nearest neighbor search in general metric spaces. In Proc. 4th Symp. Discrete Algorithms (January 1993), pp. 311-321. ACM and SIAM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. ZUPAN, J. 1982. Clustering of Large Data Sets. Chemometrics Research Studies Ser. Research Studies Press, Chichester.Google ScholarGoogle Scholar

Index Terms

  1. Fast hierarchical clustering and other applications of dynamic closest pairs

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Journal of Experimental Algorithmics
          ACM Journal of Experimental Algorithmics  Volume 5, Issue
          2000
          418 pages
          ISSN:1084-6654
          EISSN:1084-6654
          DOI:10.1145/351827
          Issue’s Table of Contents

          Copyright © 2000 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 31 December 2000
          Published in jea Volume 5, Issue

          Qualifiers

          • article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader