Abstract
We develop data structures for dynamic closest pair problems with arbitrary distance functions, that do not necessarily come from any geometric structure on the objects. Based on a technique previously used by the author for Euclidean closest pairs, we show how to insert and delete objects from an <i>n</i>-object set, maintaining the closest pair, in <i>O</i>(<i>n</i> log<sup>2</sup> <i>n</i>) time per update and <i>O</i>(<i>n</i>) space. With quadratic space, we can instead use a quadtree-like structure to achieve an optimal time bound, <i>O</i>(<i>n</i>) per update. We apply these data structures to hierarchical clustering, greedy matching, and TSP heuristics, and discuss other potential applications in machine learning, Gröbner bases, and local improvement algorithms for partition and placement problems. Experiments show our new methods to be faster in practice than previously used heuristics.
Supplemental Material
Available for Download
The software suite accompanying the article; this is a small Unix tar file, which includes the source code, a Makefile, and the test files used in the article.
- ADAMS, W. W. AND LOUSTAUNAU, P. 1994. An Introduction to Gröbner Bases. Number 3 in Graduate Studes in Mathematics. AMS.Google Scholar
- AGARWALA, R., BAFNA, V., FARACH, M., NARAYANAN, B., PATERSON, M. S., AND THORUP, M. 1998. On the approximability of numerical taxonomy (fitting distances by tree metrics). SIAM J. Computing 28, 3, 1073-1085. Google ScholarDigital Library
- AICHHOLZER, O. 1997. Combinatorial & Computational Properties of the Hypercube - New Results on Covering, Slicing, Clustering and Searching on the Hypercube. Ph. D. thesis, Tech. Univ. Graz.Google Scholar
- ANDERBERG, M. R. 1973. Cluster Analysis for Applications. Number 19 in Probability and Mathematical Statistics. Academic Press, New York.Google Scholar
- BALL, G. H. AND HALL, D.J. 1965. ISODATA, a novel method of data analysis and pattern classification. Technical report, Stanford Research Inst.Google Scholar
- BENTLEY, J.L. 1990. Experiments on traveling salesman heuristics. In Proc. 1st Symp. Discrete Algorithms (January 1990), pp. 91-99. ACM and SIAM. Google ScholarDigital Library
- BUCHBERGER, B. 1987. Applications of Gröbner bases in non-linear computational geometry. In R. JANSSEN Ed., Proc. Int. Symp. Trends in Computer Algebra, Number 296 in Lecture Notes in Computer Science (Berlin, May 1987), pp. 52-66. Springer-Verlag. Google ScholarDigital Library
- CHENG, X. AND WALLACE, J. M. 1993. Cluster analysis of the northern hemisphere wintertime 500-hPa height field: spatial patterns. J. Atmospheric Sciences 50, 16 (August), 2674-2696.Google ScholarCross Ref
- CLEGG, M., EDMONDS, J., AND IMPAGLIAZZO, R. 1996. Using the Groebner basis algorithm to find proofs of unsatisfiability. In Proc. 28th Syrup. Theory of Computing (May 1996), pp. 174-183. ACM. Google ScholarDigital Library
- CORPET, F. 1988. Multiple sequence alignment with hierarchical clustering. Nucleic Acids Research 16, 22, 10881-10890.Google ScholarCross Ref
- CZAPOR, S. 1991. A heuristic selection strategy for lexicographic Gröbner bases? In Proc. Int. Symp. Symbolic & Algebraic Computation (July 1991), pp. 39-48. ACM. Google ScholarDigital Library
- DOBKIN, D. AND SURI, S. 1991. Maintenance of geometric extrema. J. ACM 38, 275-298. Google ScholarDigital Library
- DURAN, B. S. AND ODELL, P.L. 1974. Cluster Analysis: A Survey. Number 100 in Lecture Notes in Economics and Mathematical Systems. Springer-Verlag, Berlin.Google Scholar
- EPPSTEIN, D. 1995. Dynamic Euclidean minimum spanning trees and extrema of binary functions. Discrete & Computational Geometry 13, 1 (January), 111-122.Google Scholar
- EPPSTEIN, D. AND ERICKSON, J. 1999. Raising roofs, crashing cycles, and playing pool: applications of a data structure for finding pairwise interactions. Discrete & Computational Geometry 22, 4, 569-592.Google Scholar
- FELSENSTEIN, J. 1995. PHYLIP (Phylogeny Inference Package) version 3.572c. Distributed by the author.Google Scholar
- FRIEZE, A., MCDIARMID, C., AND REED, B. 1990. Greedy matching on the line. SIAM J. Computing 19, 4 (August), 666-672. Google ScholarDigital Library
- GIOVINI, A., MORA, T., NIESI, G., ROBBIANO, L., AND TRAVERSO, C. 1991. 'One sugar cube, please' or selection strategies in the Buchberger algorithm. In Proc. Int. Symp. Symbolic & Algebraic Computation (July 1991), pp. 49-54. ACM. Google ScholarDigital Library
- GOLIN, M., RAMAN, R., SCHWARZ, C., AND SMID, M. 1998. Randomized data structures for the dynamic closest-pair problem. SIAM J. Computing 27, 4 (August), 1036-1072. Google ScholarDigital Library
- GOTOH, O. 1994. Further improvements in methods of group-to-group sequence alignment with generalized profile operations. CABIOS 10, 4, 379-387.Google Scholar
- HARTIGAN, J.A. 1975. Clustering Algorithms. John Wiley & Sons, New York. Google ScholarDigital Library
- KRZNARIC, D. AND LEVCOPOULOS, C. 1998. Fast algorithms for complete linkage clustering. Discrete & Computational Geometry 19, 1 (January), 131-145.Google Scholar
- MATIAS, Y. 1993. Semi-dynamic closest-pair algorithms. In Proc. 5th Canad. Conf. Computational Geometry (August 1993), pp. 264-271.Google Scholar
- MICHENER, C. D. AND SOKAL, R.R. 1957. A quantitative approach to a problem in classification. Evolution 11, 130-162.Google ScholarCross Ref
- PAZZANI, M.J. 1997. Constructive induction of Cartesian product attributes. Manuscript.Google Scholar
- REINGOLD, E. M. AND TARJAN, R.E. 1981. On a greedy heuristic for complete matching. SIAM J. Computing 10, 4 (November), 676-681.Google ScholarCross Ref
- ROSENKRANTZ, D. H., STEARNS, R. E., AND LEWIS, P. M., II. 1977. An analysis of several heuristics for the traveling salesman problem. SIAM J. Computing 6, 3 (September), 563- 581.Google ScholarCross Ref
- SCHWARZ, C., SMID, M., AND SNOEYINK, J. 1994. An optimal algorithm for the on-line closest pair problem. Algorithmica 12, 1 (July), 18-29.Google ScholarCross Ref
- SMID, M. 1992. Maintaining the minimal distance of a point set in polylogarithmic time. Discrete & Computational Geometry 7, 415-431.Google Scholar
- STURMFELS, B. 1996. Gröbner Bases and Convex Polytopes. Number 8 in University Lecture Ser. AMS.Google Scholar
- SUPOWIT, K.J. 1990. New techniques for some dynamic closest-point and farthest-point problems. In Proc. 1st Symp. Discrete Algorithms (1990), pp. 84-90. ACM and SIAM. Google ScholarDigital Library
- THOMPSON, J. D., HIGGINS, D. G., AND GIBSON, T. J. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice. Nucleic Acids Research 22, 4673- 4680.Google Scholar
- WARD, J. 1963. Hierarchical grouping to optimize an objective function. J. Amer. Statistical Assoc. 58, 301, 236-244.Google ScholarCross Ref
- YIANILOS, P.N. 1993. Data structures and algorithms for nearest neighbor search in general metric spaces. In Proc. 4th Symp. Discrete Algorithms (January 1993), pp. 311-321. ACM and SIAM. Google ScholarDigital Library
- ZUPAN, J. 1982. Clustering of Large Data Sets. Chemometrics Research Studies Ser. Research Studies Press, Chichester.Google Scholar
Index Terms
- Fast hierarchical clustering and other applications of dynamic closest pairs
Recommendations
Euclidean minimum spanning trees and bichromatic closest pairs
We present an algorithm to compute a Euclidean minimum spanning tree of a given setS ofN points inEd in timeO(Fd(N,N) logdN), whereFd(n,m) is the time required to compute a bichromatic closest pair amongn red andm green points inEd. IfFd(N,N)=Ω(N1+ ), ...
Euclidean minimum spanning trees and bichromatic closest pairs
We present an algorithm to compute a Euclidean minimum spanning tree of a given setS ofN points inEd in timeO(Fd(N,N) logdN), whereFd(n,m) is the time required to compute a bichromatic closest pair amongn red andm green points inEd. IfFd(N,N)=Ω(N1+ ), ...
Euclidean minimum spanning trees and bichromatic closest pairs
SCG '90: Proceedings of the sixth annual symposium on Computational geometryWe present an algorithm to compute a Euclidean minimum spanning tree of a given set S of n points in @@@@d in time 𝒪(Τd(N, N) logd N), where Τd(n, m) is the time required to compute a bichromatic closest pair among n red and m blue points in @@@@d. If ...
Comments