skip to main content
research-article

I/O-efficient batched union-find and its applications to terrain analysis

Published:08 December 2010Publication History
Skip Abstract Section

Abstract

In this article we present an I/O-efficient algorithm for the batched (off-line) version of the union-find problem. Given any sequence of N union and find operations, where each union operation joins two distinct sets, our algorithm uses O(SORT(N)) = O(N/B logM/BN/B) I/Os, where M is the memory size and B is the disk block size. This bound is asymptotically optimal in the worst case. If there are union operations that join a set with itself, our algorithm uses O(SORT(N) + MST(N)) I/Os, where MST(N) is the number of I/Os needed to compute the minimum spanning tree of a graph with N edges. We also describe a simple and practical O(SORT(N) log(N/M))-I/O algorithm for this problem, which we have implemented.

We are interested in the union-find problem because of its applications in terrain analysis. A terrain can be abstracted as a height function defined over R2, and many problems that deal with such functions require a union-find data structure. With the emergence of modern mapping technologies, huge amount of elevation data is being generated that is too large to fit in memory, thus I/O-efficient algorithms are needed to process this data efficiently. In this article, we study two terrain-analysis problems that benefit from a union-find data structure: (i) computing topological persistence and (ii) constructing the contour tree. We give the first O(SORT(N))-I/O algorithms for these two problems, assuming that the input terrain is represented as a triangular mesh with N vertices.

References

  1. Agarwal, P. K., Arge, L., and Yi, K. 2005. I/O-efficient construction of constrained Delaunay triangulations. In Proceedings of the 13th Annual European Symposium on Algorithms. 355--366. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Agarwal, P. K., Edelsbrunner, H., Harer, J., and Wang, Y. 2006. Extreme elevation on a 2-manifold. Disc. Computat. Geom. 36, 553--572. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Aggarwal, A., and Vitter, J. S. 1988. The input/output complexity of sorting and related problems. Com. ACM 31, 9, 1116--1127. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Arge, L. 1995. The I/O-complexity of ordered binary-decision diagram manipulation. In Proceedings of the 6th International Symposium on Algorithms and Computation. 82--91. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Arge, L. 2002. External memory data structures. In Handbook of Massive Data Sets, J. Abello, P. M. Pardalos, and M. G. C. Resende, Eds. Kluwer Academic Publishers, 313--358. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Arge, L., Brodal, G. S., and Toma, L. 2004. On external memory MST, SSSP and multi-way planar graph separation. J. Alg. 53, 186--206. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Arge, L., Procopiuc, O., and Vitter, J. S. 2002. Implementing I/O-efficient data structures using TPIE. In Proceedings of the 10th European Symposium on Algorithms. 88--100. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Arge, L., Vengroff, D. E., and Vitter, J. S. 2007. External-memory algorithms for processing line segments in geographic information systems. Algorithmica 47, 1--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Becker, B., Gschwind, S., Ohler, T., Seeger, B., and Widmayer, P. 1996. An asymptotically optimal multiversion B-tree. VLDB J. 5, 264--275. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Bremer, P. T., Pascucci, V., Edelsbrunner, H., and Hamann, B. 2004. A topological hierarchy for functions on triangulated surfaces. IEEE Trans. Visual. Comput. Graph. 10, 385--396. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Carr, H., Snoeyink, J., and Axen, U. 2003. Computing contour trees in all dimensions. Comp. Geom. Theory Applic. 24, 75--94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Chiang, Y.-J., Goodrich, M. T., Grove, E. F., Tamassia, R., Vengroff, D. E., and Vitter, J. S. 1995. External-memory graph algorithms. In Proceedings of the 6th Annual ACM-SIAM Symposium on Discrete Algorithms. 139--149. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. 2001. Introduction to Algorithms, 2nd ed. The MIT Press, Cambridge, Mass. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Danner, A., Mølhave, T., Yi, K., Agarwal, P. K., Arge, L., and Mitasova, H. 2007. Terrastream: From elevation data to watershed hierarchies. In Proceedings of the 9th International Symposium on Advances in Geographic Information Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Dementiev, R., Sanders, P., Schultes, D., and Sibeyn, J. 2004. Engineering an external memory minimum spanning tree algorithm. In Proceedings of the 3rd IFIP International Conference on Theoretical Computer Science. 195--208.Google ScholarGoogle Scholar
  16. Edelsbrunner, H., and Harer, J. 2008. Persistent homology---A survey. In Surveys on Discrete and Computational Geometry (Goodman, J. E., Pach, J., and Pollack, R. Eds.), American Mathematical Society, Providence, RI. 257--282.Google ScholarGoogle Scholar
  17. Edelsbrunner, H., Harer, J., and Zomorodian, A. 2001. Hierarchical Morse complexes for piecewise linear 2-manifolds. In Proceedings of the 17th Annual Symposium on Computational Geometry. 70--79. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Edelsbrunner, H., Letscher, D., and Zomorodian, A. 2000. Topological persistence and simplification. In Proceedings of the 41st Annual IEEE Symposium on Foundations of Computer Science. 454--463. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Fredman, M. L., and Saks, M. E. 1989. The cell probe complexity of dynamic data structures. In Proceedings of the 21st Annual ACM Symposium on Theory of Computing. 345--354. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Gabow, H., and Tarjan, R. 1985. A linear time algorithm for a special case of disjoint set union. J. Comput. Syst. Sci. 30, 209--221.Google ScholarGoogle ScholarCross RefCross Ref
  21. Galler, B. A., and Fisher, M. J. 1964. An improved equivalence algorithm. Comm. ACM 7, 5, 301--303. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Goodrich, M. T., Tsay, J.-J., Vengroff, D. E., and Vitter, J. S. 1993. External-memory computational geometry. In Proceedings of the 34th IEEE Symposium on Foundations of Computer Science. 714--723. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Milnor, J. W. 1963. Morse Theory. Princeton University Press, Princeton, NJ.Google ScholarGoogle Scholar
  24. Moret, B. M. E., and Shapiro, H. D. 1991. An empirical analysis of algorithms for constructing a minimum spanning tree. In Proceedings of the 2nd Workshop Algorithms Data Structures. 400--411.Google ScholarGoogle Scholar
  25. North Carolina Flood Mapping Program. http://www.ncfloodmaps.com.Google ScholarGoogle Scholar
  26. Seidel, R., and Sharir, M. 2005. Top-down analysis of path compression. SIAM J. Comput. 34, 3, 515--525. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Tarasov, S. P., and Vyalyi, M. N. 1998. Construction of contour trees in 3D in O(n log n) steps. In Proceedings of the 14th Annual Symposium on Computational Geometry. 68--75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Tarjan, R. E. 1975. Efficiency of a good but not linear set union algorithm. J. ACM 22, 215--225. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Tarjan, R. E. 1979. A class of algorithms that require nonlinear time to maintain disjoint sets. J. Comput. Syst. Sci. 18, 110--127.Google ScholarGoogle ScholarCross RefCross Ref
  30. Tarjan, R. E., and Leeuwen, J. V. 1984. Worst-case analysis of set union algorithms. J. ACM 31, 245--281. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. van den Bercken, J., Seeger, B., and Widmayer, P. 1997. A generic approach to bulk loading multidimensional index structures. In Proceedings of the 23rd International Conference on Very Large Databases. 406--415. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. van Kreveld, M., van Oostrum, R., Bajaj, C., Pascucci, V., and Schikore, D. 1997. Contour trees and small seed sets for isosurface traversal. In Proceedings of the 13th Annual Symposium on Computational Geometry. 212--219. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Vitter, J. S. 2001. External memory algorithms and data structures: Dealing with MASSIVE data. ACM Comput. Surv. 33, 2, 209--271. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Zeh, N. 2002. I/O-efficient graph algorithms. EEF Summer School on Massive Data Sets.Google ScholarGoogle Scholar
  35. Zomorodian, A. 2005. Topology for Computing. Cambridge University Press, Cambridge, UK. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. I/O-efficient batched union-find and its applications to terrain analysis

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Algorithms
      ACM Transactions on Algorithms  Volume 7, Issue 1
      November 2010
      282 pages
      ISSN:1549-6325
      EISSN:1549-6333
      DOI:10.1145/1868237
      Issue’s Table of Contents

      Copyright © 2010 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 8 December 2010
      • Accepted: 1 December 2007
      • Received: 1 November 2007
      Published in talg Volume 7, Issue 1

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader