Abstract
In this article we present an I/O-efficient algorithm for the batched (off-line) version of the union-find problem. Given any sequence of N union and find operations, where each union operation joins two distinct sets, our algorithm uses O(SORT(N)) = O(N/B logM/BN/B) I/Os, where M is the memory size and B is the disk block size. This bound is asymptotically optimal in the worst case. If there are union operations that join a set with itself, our algorithm uses O(SORT(N) + MST(N)) I/Os, where MST(N) is the number of I/Os needed to compute the minimum spanning tree of a graph with N edges. We also describe a simple and practical O(SORT(N) log(N/M))-I/O algorithm for this problem, which we have implemented.
We are interested in the union-find problem because of its applications in terrain analysis. A terrain can be abstracted as a height function defined over R2, and many problems that deal with such functions require a union-find data structure. With the emergence of modern mapping technologies, huge amount of elevation data is being generated that is too large to fit in memory, thus I/O-efficient algorithms are needed to process this data efficiently. In this article, we study two terrain-analysis problems that benefit from a union-find data structure: (i) computing topological persistence and (ii) constructing the contour tree. We give the first O(SORT(N))-I/O algorithms for these two problems, assuming that the input terrain is represented as a triangular mesh with N vertices.
- Agarwal, P. K., Arge, L., and Yi, K. 2005. I/O-efficient construction of constrained Delaunay triangulations. In Proceedings of the 13th Annual European Symposium on Algorithms. 355--366. Google ScholarDigital Library
- Agarwal, P. K., Edelsbrunner, H., Harer, J., and Wang, Y. 2006. Extreme elevation on a 2-manifold. Disc. Computat. Geom. 36, 553--572. Google ScholarDigital Library
- Aggarwal, A., and Vitter, J. S. 1988. The input/output complexity of sorting and related problems. Com. ACM 31, 9, 1116--1127. Google ScholarDigital Library
- Arge, L. 1995. The I/O-complexity of ordered binary-decision diagram manipulation. In Proceedings of the 6th International Symposium on Algorithms and Computation. 82--91. Google ScholarDigital Library
- Arge, L. 2002. External memory data structures. In Handbook of Massive Data Sets, J. Abello, P. M. Pardalos, and M. G. C. Resende, Eds. Kluwer Academic Publishers, 313--358. Google ScholarDigital Library
- Arge, L., Brodal, G. S., and Toma, L. 2004. On external memory MST, SSSP and multi-way planar graph separation. J. Alg. 53, 186--206. Google ScholarDigital Library
- Arge, L., Procopiuc, O., and Vitter, J. S. 2002. Implementing I/O-efficient data structures using TPIE. In Proceedings of the 10th European Symposium on Algorithms. 88--100. Google ScholarDigital Library
- Arge, L., Vengroff, D. E., and Vitter, J. S. 2007. External-memory algorithms for processing line segments in geographic information systems. Algorithmica 47, 1--25. Google ScholarDigital Library
- Becker, B., Gschwind, S., Ohler, T., Seeger, B., and Widmayer, P. 1996. An asymptotically optimal multiversion B-tree. VLDB J. 5, 264--275. Google ScholarDigital Library
- Bremer, P. T., Pascucci, V., Edelsbrunner, H., and Hamann, B. 2004. A topological hierarchy for functions on triangulated surfaces. IEEE Trans. Visual. Comput. Graph. 10, 385--396. Google ScholarDigital Library
- Carr, H., Snoeyink, J., and Axen, U. 2003. Computing contour trees in all dimensions. Comp. Geom. Theory Applic. 24, 75--94. Google ScholarDigital Library
- Chiang, Y.-J., Goodrich, M. T., Grove, E. F., Tamassia, R., Vengroff, D. E., and Vitter, J. S. 1995. External-memory graph algorithms. In Proceedings of the 6th Annual ACM-SIAM Symposium on Discrete Algorithms. 139--149. Google ScholarDigital Library
- Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. 2001. Introduction to Algorithms, 2nd ed. The MIT Press, Cambridge, Mass. Google ScholarDigital Library
- Danner, A., Mølhave, T., Yi, K., Agarwal, P. K., Arge, L., and Mitasova, H. 2007. Terrastream: From elevation data to watershed hierarchies. In Proceedings of the 9th International Symposium on Advances in Geographic Information Systems. Google ScholarDigital Library
- Dementiev, R., Sanders, P., Schultes, D., and Sibeyn, J. 2004. Engineering an external memory minimum spanning tree algorithm. In Proceedings of the 3rd IFIP International Conference on Theoretical Computer Science. 195--208.Google Scholar
- Edelsbrunner, H., and Harer, J. 2008. Persistent homology---A survey. In Surveys on Discrete and Computational Geometry (Goodman, J. E., Pach, J., and Pollack, R. Eds.), American Mathematical Society, Providence, RI. 257--282.Google Scholar
- Edelsbrunner, H., Harer, J., and Zomorodian, A. 2001. Hierarchical Morse complexes for piecewise linear 2-manifolds. In Proceedings of the 17th Annual Symposium on Computational Geometry. 70--79. Google ScholarDigital Library
- Edelsbrunner, H., Letscher, D., and Zomorodian, A. 2000. Topological persistence and simplification. In Proceedings of the 41st Annual IEEE Symposium on Foundations of Computer Science. 454--463. Google ScholarDigital Library
- Fredman, M. L., and Saks, M. E. 1989. The cell probe complexity of dynamic data structures. In Proceedings of the 21st Annual ACM Symposium on Theory of Computing. 345--354. Google ScholarDigital Library
- Gabow, H., and Tarjan, R. 1985. A linear time algorithm for a special case of disjoint set union. J. Comput. Syst. Sci. 30, 209--221.Google ScholarCross Ref
- Galler, B. A., and Fisher, M. J. 1964. An improved equivalence algorithm. Comm. ACM 7, 5, 301--303. Google ScholarDigital Library
- Goodrich, M. T., Tsay, J.-J., Vengroff, D. E., and Vitter, J. S. 1993. External-memory computational geometry. In Proceedings of the 34th IEEE Symposium on Foundations of Computer Science. 714--723. Google ScholarDigital Library
- Milnor, J. W. 1963. Morse Theory. Princeton University Press, Princeton, NJ.Google Scholar
- Moret, B. M. E., and Shapiro, H. D. 1991. An empirical analysis of algorithms for constructing a minimum spanning tree. In Proceedings of the 2nd Workshop Algorithms Data Structures. 400--411.Google Scholar
- North Carolina Flood Mapping Program. http://www.ncfloodmaps.com.Google Scholar
- Seidel, R., and Sharir, M. 2005. Top-down analysis of path compression. SIAM J. Comput. 34, 3, 515--525. Google ScholarDigital Library
- Tarasov, S. P., and Vyalyi, M. N. 1998. Construction of contour trees in 3D in O(n log n) steps. In Proceedings of the 14th Annual Symposium on Computational Geometry. 68--75. Google ScholarDigital Library
- Tarjan, R. E. 1975. Efficiency of a good but not linear set union algorithm. J. ACM 22, 215--225. Google ScholarDigital Library
- Tarjan, R. E. 1979. A class of algorithms that require nonlinear time to maintain disjoint sets. J. Comput. Syst. Sci. 18, 110--127.Google ScholarCross Ref
- Tarjan, R. E., and Leeuwen, J. V. 1984. Worst-case analysis of set union algorithms. J. ACM 31, 245--281. Google ScholarDigital Library
- van den Bercken, J., Seeger, B., and Widmayer, P. 1997. A generic approach to bulk loading multidimensional index structures. In Proceedings of the 23rd International Conference on Very Large Databases. 406--415. Google ScholarDigital Library
- van Kreveld, M., van Oostrum, R., Bajaj, C., Pascucci, V., and Schikore, D. 1997. Contour trees and small seed sets for isosurface traversal. In Proceedings of the 13th Annual Symposium on Computational Geometry. 212--219. Google ScholarDigital Library
- Vitter, J. S. 2001. External memory algorithms and data structures: Dealing with MASSIVE data. ACM Comput. Surv. 33, 2, 209--271. Google ScholarDigital Library
- Zeh, N. 2002. I/O-efficient graph algorithms. EEF Summer School on Massive Data Sets.Google Scholar
- Zomorodian, A. 2005. Topology for Computing. Cambridge University Press, Cambridge, UK. Google ScholarDigital Library
Index Terms
- I/O-efficient batched union-find and its applications to terrain analysis
Recommendations
I/O-efficient batched union-find and its applications to terrain analysis
SCG '06: Proceedings of the twenty-second annual symposium on Computational geometryDespite extensive study over the last four decades and numerous applications, no I/O-efficient algorithm is known for the union-find problem. In this paper we present an I/O-efficient algorithm for the batched (off-line) version of the union-find ...
Union-Find with Constant Time Deletions
A union-find data structure maintains a collection of disjoint sets under the operations makeset, union, and find. Kaplan, Shafrir, and Tarjan [SODA 2002] designed data structures for an extension of the union-find problem in which items of the sets ...
Melding priority queues
We show that any priority queue data structure that supports insert, delete, and find-min operations in pq(n) amortized time, where n is an upper bound on the number of elements in the priority queue, can be converted into a priority queue data ...
Comments