Abstract
Existing estimation approaches for spatial databases often rely on the assumption that data distribution in a small region is uniform, which seldom holds in practice. Moreover, their applicability is limited to specific estimation tasks under certain distance metric. This paper develops the Power-method, a comprehensive technique applicable to a wide range of query optimization problems under both L∞ and L2 metrics. The Power-method eliminates the local uniformity assumption and is, therefore, accurate even for datasets where existing approaches fail. Furthermore, it performs estimation by evaluating only one simple formula with minimal computational overhead. Extensive experiments confirm that the Power-method outperforms previous techniques in terms of accuracy and applicability to various optimization scenarios.
Similar content being viewed by others
References
S. Acharya, V. Poosala, and S. Ramaswamy. “Selectivity Estimation in Spatial Databases,” In Proceedings of ACM SIGMOD Conference, 13–24, 1999.
N. An, Z. Yang, and A. Sivasubramaniam. “Selectivity Estimation for Spatial Joins,” In Proceedings of ICDE Conference, 368–375, 2001.
W. Aref and H. Samet. “A Cost Model for Query Optimization Using R-Trees,” In Proceedings of ACM GIS Conference, 1–8, 1994.
N. Beckmann, H. Kriegel, R. Schneider, and B. Seeger. “The R*-tree: An Efficient and Robust Access Method for Points and Rectangles,” In Proceedings of ACM SIGMOD Conference, 322–331, 1990.
A. Belussi and C. Faloutsos. “Estimating the Selectivity of Spatial Queries Using the Correlation’s Fractal Dimension,” In Proceedings of VLDB Conference, 299–310, 1995.
S. Berchtold, C. Bohm, D. Keim, and H. Kriegel. “A Cost Model for Nearest Neighbor Search in High-Dimensional Data Space,” In Proceedings of ACM PODS Conference, 78–86, 1997.
S. Berchtold, D. Keim, and H. Kriegel. “The X-tree: An Index Structure for High-Dimensional Data,” In Proceedings of VLDB Conference, 28–39, 1996.
K. Beyer, J. Goldstein, and R. Ramakrishnan. “When Is “Nearest Neighbor” Meaningful?” In Proceedings of ICDT Conference, 217–235, 1999.
B. Blohsfeld, D. Korus, and B. Seeger. “A Comparison of Selectivity Estimators for Range Queries on Metric Attributes,” In Proceedings of ACM SIGMOD Conference, 239–250, 1999.
C. Bohm. “A cost model for query processing in high dimensional data spaces,” ACM TODS, Vol. 25(2):129–178, 2000.
T. Brinkhoff, H. Kriegel, and B. Seeger. “Efficient Processing of Spatial Joins Using R-trees,” In Proceedings of ACM SIGMOD Conference, 237–246, 1993.
N. Bruno, L. Gravano, and S. Chaudhuri. “STHoles: A Workload Aware Multidimensional Histogram,” In Proceedings of ACM SIGMOD Conference, 211–222, 2001.
S. Chaudhuri, G. Das, M. Datar, R. Motwani, and V. Narasayya. “Overcoming Limitations of Sampling for Aggregation Queries,” In Proceedings of IEEE ICDE Conference, 534–542, 2001.
A. Corral, Y. Manolopoulos, Y. Theodoridis, and M. Vassilakopoulos. “Closest Pair Queries in Spatial Databases,” In Proceedings of ACM SIGMOD Conference, 189–200, 2000.
A. Deshpande, M. Garofalakis, and R. Rastogi. “Independence Is Good: Dependency-Based Histogram Synopses for High-Dimensional Data,” In Proceedings of ACM SIGMOD Conference, 199–210, 2001.
C. Faloutsos and I. Kamel. “Beyond Uniformity and Independence, Analysis of R-trees Using the Concept of Fractal Dimension,” In Proceedings of ACM PODS Conference, 4–13, 1994.
C. Faloutsos, B. Seeger, A. Traina, and C. Traina. “Spatial Join Selectivity Using Power Laws,” In Proceedings of ACM SIGMOD Conference, 177–188, 2000.
D. Gunopulos, G. Kollios, V. Tsotras, and C. Domeniconi. “Approximate Multi-Dimensional Aggregate Range Queries over Real Attributes,” In Proceedings of ACM SIGMOD Conference, 463–474, 2000.
J. Jin, N. An, and A. Sivasubramaniam. “Analyzing Range Queries on Spatial Data,” In Proceedings of IEEE ICDE Conference, 525–534, 2000.
J. Lee, D. Kim, and C. Chung. “Multidimensional Selectivity Estimation Using Compressed Histogram Information,” In Proceedings of ACM SIGMOD Conference, 205–214, 1999.
H. Lin and B. Huang. “Sql/sda: a query language for supporting spatial data analysis and its web-based implementation,” IEEE TKDE, Vol. 13(4):671–682, 2001.
Y. Mattias, J. Vitter, and M. Wang. “Wavelet-Based Histograms for Selectivity Estimation,” In Proceedings of ACM SIGMOD Conference, 448–459, 1998.
Y. Mattias, J. Vitter, and M. Wang. “Dynamic Maintenance of Wavelet-Based Histograms,” In Proceedings of VLDB Conference, 101–110, 2000.
M. Muralikrishna and D. DeWitt. “Equi-Depth Histograms for Estimating Selectivity Factors for Multi-Dimensional Queries,” In Proceedings of ACM SIGMOD Conference, 28–36, 1998.
F. Olken and D. Rotem. “Random Sampling from Database Files: A Survey,” In Proceedings of IEEE SSDBM Conference, 92–111, 1990.
B. Pagel, F. Korn, and C. Faloutsos. “Deflating the Dimensionality Curse using Multiple Fractal Dimensions,” In Proceedings of IEEE ICDE Conference, 589–598, 2000.
B. Pagel, H. Six, H. Toben, and P. Widmayer. “Towards an Analysis of Range Query Performance in Spatial Data Structures,” In Proceedings of ACM PODS Conference, 214–221, 1993.
C. Palmer and C. Faloutsos. “Density Biased Sampling: An Improved Method for Data Mining and Clustering,” In Proceedings of ACM SIGMOD Conference, 82–92, 2000.
Y. Poosala and Y. Ioannidis. “Selectivity Estimation without the Attribute Value Independence Assumption,” In Proceedings of VLDB Conference, 486–495, 1997.
Y. Sakurai, M. Yoshikawa, S. Uemura, and H. Kojima. “The A-tree: An Index Structure for High-Dimensional Spaces Using Relative Approximation,” In Proceedings of VLDB Conference, 516–526, 2000.
S. Shekhar, M. Coyle, D. Liu, B. Goyal, and S. Sarkar. “Data models in geographic information systems,” Communication of the ACM, Vol. 40(4), 1997.
C. Sun, D. Agrawal, and A. El Abbadi. “Exploring Spatial Datasets with Histograms,” In Proceedings of IEEE ICDE Conference, 93–102, 2002.
N. Thaper, S. Guha, P. Indyk, and N. Koudas. “Dynamic Multidimensional Histograms,” In Proceedings of ACM SIGMOD Conference, 428–439, 2002.
Y. Theodoridis and T. Sellis. “A Model for the Prediction of R-tree Performance,” In Proceedings of ACM PODS, 161–171, 1996.
Y. Theodoridis, E. Stefanakis, and T. Sellis. “Cost Models for Join Queries in Spatial Databases,” In Proceedings of IEEE ICDE Conference, 476–483, 1998.
Y. Wu, D. Agrawal, and A. El Abbadi. “Applying the Golden Rule of Sampling for Query Estimation,” In Proceedings of ACM SIGMOD Conference, 449–460, 2001.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Tao, Y., Faloutsos, C. & Papadias, D. Spatial Query Estimation without the Local Uniformity Assumption. Geoinformatica 10, 261–293 (2006). https://doi.org/10.1007/s10707-006-9828-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10707-006-9828-7