skip to main content
article

Iterative spatial join

Published:01 September 2003Publication History
Skip Abstract Section

Abstract

The key issue in performing spatial joins is finding the pairs of intersecting rectangles. For unindexed data sets, this is usually resolved by partitioning the data and then performing a plane sweep on the individual partitions. The resulting join can be viewed as a two-step process where the partition corresponds to a hash-based join while the plane-sweep corresponds to a sort-merge join. In this article, we look at extending the idea of the sort-merge join for one-dimensional data to multiple dimensions and introduce the Iterative Spatial Join. As with the sort-merge join, the Iterative Spatial Join is best suited to cases where the data is already sorted. However, as we show in the experiments, the Iterative Spatial Join performs well when internal memory is limited, compared to the partitioning methods. This suggests that the Iterative Spatial Join would be useful for very large data sets or in situations where internal memory is a shared resource and is therefore limited, such as with today's database engines which share internal memory amongst several queries. Furthermore, the performance of the Iterative Spatial Join is predictable and has no parameters which need to be tuned, unlike other algorithms. The Iterative Spatial Join is based on a plane sweep algorithm, which requires the entire data set to fit in internal memory. When internal memory overflows, the Iterative Spatial Join simply makes additional passes on the data, thereby exhibiting only a gradual performance degradation. To demonstrate the use and efficacy of the Iterative Spatial Join, we first examine and analyze current approaches to performing spatial joins, and then give a detailed analysis of the Iterative Spatial Join as well as present the results of extensive testing of the algorithm, including a comparison with partitioning-based spatial join methods. These tests show that the Iterative Spatial Join overcomes the performance limitations of the other algorithms for data sets of all sizes as well as differing amounts of internal memory.

References

  1. Aref, W. G. and Samet, H. 1992. Uniquely reporting spatial objects: Yet another operation for comparing spatial data structures. In Proc. of the 5th Int. Symposium on Spatial Data Handling. (Charleston, S.C), 178--189.]]Google ScholarGoogle Scholar
  2. Aref, W. G. and Samet, H. 1994a. The spatial filter revisited. In Proc. of the 6th Int. Symposium on Spatial Data Handling, (Edinburgh, Scotland.) T. C. Waugh and R. G. Healey, Eds. International Geographical Union Commission on Geographic Information Systems, Association for Geographical Information, pp. 190--208.]]Google ScholarGoogle Scholar
  3. Aref, W. G. and Samet, H. 1994b. Hashing by proximity to process duplicates in spatial databases. In Proc. of the 3rd Int. Conf. on Information and Knowledge Management (CIKM) (Gaithersburg, Md.). pp. 347--354.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Arge, L., Procopiuc, O., Ramaswamy, S., Suel, T., and Vitter, J. S. 1998. Scalable sweeping-based spatial join. In Proc. of the 24th Int. Conf. on Very Large Data Bases (VLDB), A. Gupta, O. Shmueli, and J. Widom, Eds. pp. 570--581.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Batory, D. S., Barnett, J. R., Garza, J. F., Smith, K. P., Tsukuda, K., Twichell, B. C., and Wise, T. E. 1988. GENESIS: An extensible database management system. IEEE Trans. Softw. Eng. 14, 11, 1711--1730.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Beckmann, N., Kriegel, H.-P., Schneider, R., and Seeger, B. 1990. The R(*)-tree: An efficient and robust access method for points and rectangles. In Proc. of the 1990 ACM SIGMOD Int. Conf. on Management of Data (Atlantic City, N.J.). ACM, New York, pp. 322--331.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Bentley, J. L. and Friedman, J. H. 1979. Data structures for range searching. ACM Comput. Surv. 11, 4 (Dec.), 397--409.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Berchtold, S., Böhm, C., and Kriegel, H.-P. 1998. Improving the query performance of high-dimensional index structures by bulk-load operations. In Advances in Database Technology---EDBT'98, Proc. of the 6th Int. Conf. on Extending Database Technology (Valencia, Spain). Lecture Notes in Computer Science, vol. 1377. pp. 216--230.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Brinkhoff, T., Kriegel, H.-P., Schneider, R., and Seeger, B. 1994. GENESYS: A system for efficient spatial query processing. In Proc. of the 1994 ACM SIGMOD Int. Conf. on Management of Data (Minneapolis, Minn.). ACM, New York, p. 519.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Brinkhoff, T., Kriegel, H.-P., and Seeger, B. 1993. Efficient processing of spatial joins using R-trees. In Proc. of the 1993 ACM SIGMOD Int. Conf. on Management of Data (Washington, D.C.). P. Buneman and S. Jajodia, Eds. ACM, New York, pp. 237--246.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Carey, M. J., DeWitt, D. J., Richardson, J. E., and Shekita, E. J. 1986. Object and file management in the EXODUS extensible database system. In Proc. of the 12th Int. Conf. on Very Large Databases (VLDB) (Kyoto, Japan), W. W. Chu, G. Gardarin, S. Ohsuga, and Y. Kambayashi, Eds. pp. 91--100.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Cormen, T. H., Leiserson, C. E., and Rivest, R. L. 1990. Introduction to Algorithms. MIT Press/McGraw-Hill, Cambridge, Mass., Chapter 15, pp. 290--296.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Dittrich, J.-P. and Seeger, B. 2000. Data redundancy and duplicate detection in spatial join processing. In Proc. of the 16th IEEE Int. Conf. on Data Engineering (San Diego, Calif.). IEEE Computer Society Press, Los Alamitos, Calif., pp. 535--546.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Edelsbrunner, H. 1980. Dynamic Rectangle Intersection Searching. Institute for Information Processing 47, Technical University of Graz, Graz, Austria. Feb.]]Google ScholarGoogle Scholar
  15. Esperança, C. and Samet, H. 1996. Spatial database programming using SAND. In Proc. of the 7th Int. Symposium on Spatial Data Handling, vol. 2. (Delft, The Netherlands). M. J. Kraak and M. Molenaar, Eds. International Geographical Union Commission on Geographic Information Systems, Association for Geographical Information, pp. A29--A42.]]Google ScholarGoogle Scholar
  16. Ferhatosmanoglu, H., Tuncel, E., Agrawal, D., and El Abbadi, A. 2000. Vector approximation based indexing for non-uniform high dimensional data sets. In Proc. of the 9th Int. Conf. on Information and Knowledge Management (CIKM) (McLean, Va). pp. 202--209.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Graefe, G. 1993. Query evaluation techniques for large databases. ACM Comput. Surv. 25, 2 (June), 73--170.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Günther, O. 1993. Efficient computation of spatial joins. In Proc. of the 9th IEEE Int. Conf. on Data Engineering (Vienna, Austria). IEEE Computer Society Press, Los Alamitos, Calif., pp. 50--59.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Güting, R. H. and Schilling, W. 1987. A practical divide-and-conquer algorithm for the rectangle intersection problem. Inf. Sci. 42, 2 (July), 95--112.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Herring, J. R. 1996. Oracle7 spatial data optionTM: Advances in relational database technology for spatial data management. Tech. rep., Oracle Corporation. Sept.]]Google ScholarGoogle Scholar
  21. Hoel, E. G. and Samet, H. 1995. Benchmarking spatial join operations with spatial output. In Proc. of the 21st Int. Conf. on Very Large Data Bases (VLDB) (Zurich, Switzerland), U. Dayal, P. M. D. Gray, and S. Nishio, Eds. pp. 606--618.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Huang, Y.-W., Jones, M. C., and Rundensteiner, E. A. 1997. Improving spatial intersect joins using symbolic intersect detection. In Advances in Spatial Databases---5th Int. Symposium, SSD'97 (Berlin, Germany), M. Scholl and A. Voisard, Eds. Lecture Notes in Computer Science, vol. 1262. pp. 165--177.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Iwerks, G. and Samet, H. 1999. The spatial spreadsheet. In Proc. of the 3rd Int. Conf. on Visual Information Systems (VISUAL99) (Amsterdam, The Netherlands), D. P. Huijsmans and A. W. M. Smeulders, Eds. pp. 317--324.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Kedem, G. 1981. The quad-cif tree: A data structure for hierarchical on-line algorithms. Computer Science Department TR-91, University of Rochester, Rochester, NY. Sept.]]Google ScholarGoogle Scholar
  25. Knuth, D. E. 1973. The Art of Computer Programming: Sorting and Searching, vol. 3. Addison-Wesley, Reading, Mass.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Koudas, N. and Sevcik, K. C. 1997. Size separation spatial join. In Proc. of the 1997 ACM SIGMOD Int. Conf. on Management of Data (Tucson, Az.), J. Peckham, Ed. ACM, New York, pp. 324--335.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Lo, M.-L. and Ravishankar, C. V. 1994. Spatial joins using seeded trees. In Proc. of the 1994 ACM SIGMOD Int. Conf. on Management of Data (Minneapolis, Minn.), R. T. Snodgrass and M. Winslett, Eds. ACM, New York, pp. 209--220.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Lo, M.-L. and Ravishankar, C. V. 1995. Generating seeded trees from data sets. In Advances in Spatial Databases---4th Int. Symposium, SSD'95 (Portland, Me.), M. J. Egenhofer and J. R. Herring, Eds. Lecture Notes in Computer Science, vol. 951. pp. 328--347.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Lo, M.-L. and Ravishankar, C. V. 1996. Spatial hash-joins. In Proc. of the 1996 ACM SIGMOD Int. Conf. on Management of Data (Montréal, Ont., Canada). ACM, New York, pp. 247--258.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Mamoulis, N. and Papadias, D. 1999. Integration of spatial join algorithms for processing multiple inputs. In Proc. of the 1999 ACM SIGMOD Int. Conf. on Management of Data (Philadelphia, Pa.), A. Delis, C. Faloutsos, and S. Ghandeharizadeh, Eds. ACM, New York, pp. 1--12.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Neyer, G. and Widmayer, P. 1997. Singularities make spatial join scheduling hard. In Algorithms and Computation, 8th Int. Symposium, ISAAC. (Singapore). pp. 293--302.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Orenstein, J. A. 1986. Spatial query processing in an object-oriented database system. In Proc. of the 1986 ACM SIGMOD Int. Conf. on Management of Data (Washington, D.C.), C. Zaniolo, Ed. ACM, New York, pp. 326--336.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Orenstein, J. A. 1989. Strategies for optimizing the use of redundancy in spatial databases. In Design and Implementation of Large Spatial Databases---1st Symposium, SSD'89 (Santa Barbara, Calif.), A. P. Buchmann, O. Günther, T. R. Smith, and Y.-F. Wang, Eds. Lecture Notes in Computer Science, vol. 409, pp. 115--134.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Orenstein, J. A. 1991. An algorithm for computing the overlay of k-dimensional spaces. In Advances in Spatial Databases---2nd Int. Symposium, SSD'91 (Zurich, Switzerland), O. Günther and H.-J. Schek, Eds. Lecture Notes in Computer Science, vol. 525, pp. 381--400.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Ottmann, T. and Wood, D. 1986. Space-economical plane-sweep algorithms. Comput. Vis. Graph. Image Proc. 34, 1, 35--51.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Patel, J. M. and DeWitt, D. J. 1996. Partition based spatial-merge join. In Proc. of the 1996 ACM SIGMOD Int. Conf. on Management of Data (Montréal, Ont., Canada), H. V. Jagadish and I. S. Mumick, Eds. ACM, New York, pp. 259--270.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Preparata, F. P. and Shamos, M. I. 1985. Computational Geometry: An Introduction. Springer-Verlag, New York.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Stonebraker, M., Frew, J., and Dozier, J. 1993. The SEQUOIA 2000 project. In Advances in Spatial Databases---3rd Int. Symposium, SSD'93 (Singapore), D. Abel and B. C. Ooi, Eds. Lecture Notes in Computer Science, vol. 692, pp. 397--412.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Szymansky, T. G. and van Wyk, C. J. 1983. Space efficient algorithms for VLSI artwork analysis. In Proc. of the 20th Design Automation Conf. (San Jose). pp. 734--739.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. U.S. Bureau of the Census. 1992. Tiger/Line files. Tech. rep., U.S. Bureau of the Census, Washington, DC.]]Google ScholarGoogle Scholar
  41. Vengroff, D. E. and Vitter, J. S. 1995. I/O-efficient scientific computation using TPIE. Tech. Rep. Technical report DUKE--TR-1995--18, Duke University.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Vuillemin, J. 1978. A data structure for manipulating priority queues. Commun. ACM 21, 4, 309--315.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Weber, R., Schek, H.-J., and Blott, S. 1998. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In Proc. of the 24th Int. Conf. on Very Large Data Bases (VLDB) (New York, N.Y.), A. Gupta, O. Shmueli, and J. Widom, Eds. pp. 194--205.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Iterative spatial join

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader