Abstract
Nearest-neighbor queries in high-dimensional space are of high importance in various applications, especially in content-based indexing of multimedia data. For an optimization of the query processing, accurate models for estimating the query processing costs are needed. In this paper, we propose a new cost model for nearest neighbor queries in high-dimensional space, which we apply to enhance the performance of high-dimensional index structures. The model is based on new insights into effects occurring in high-dimensional space and provides a closed formula for the processing costs of nearest neighbor queries depending on the dimensionality, the block size and the database size. From the wide range of possible applications of our model, we select two interesting samples: First, we use the model to prove the known linear complexity of the nearest neighbor search problem in high-dimensional space, and second, we provide a technique for optimizing the block size. For data of medium dimensionality, the optimized block size allows significant speed-ups of the query processing time when compared to traditional block sizes and to the linear scan.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Altschul S. F., Gish W., Miller W., Myers E. W., Lipman D. J.: ‘A Basic Local Alignment Search Tool’, Journal of Molecular Biology, Vol. 215, No. 3, 1990, pp. 403–410.
Beckmann N., Kriegel H.-P., Schneider R., Seeger B.: ‘The R*-tree: An Efficient and Robust Access Method for Points and Rectangles’, Proc. ACM SIGMOD Int. Conf. on Management of Data, Atlantic City, NJ, 1990, pp. 322–331.
Berchtold S., Böhm C., Keim D., Kriegel H.-P.: ‘A Cost Model For Nearest Neighbor Search in High-Dimensional Data Space’, Proc. ACM PODS Int. Conf. on Principles of Databases, Tucson, Arizona, 1997.
Berchtold S., Böhm C., Braunmüller B., Keim D., Kriegel H.-P.: ‘Fast Parallel Similarity Search in Multimedia Databases’, Proc. ACM SIGMOD Int. Conf. on Management of Data, Tucson, Arizona, 1997.
Berchtold S., Keim D. A.: ‘High-dimensional Index Structures: Database Support for Next Decades’s Applications’, Tutorial, Proc. ACM SIGMOD Int. Conf. on Management of Data, 1998, p. 501.
Berchtold S., Keim D., Kriegel H.-P.: ‘The X-tree: An Index Structure for High-Dimensional Data’, 22nd Conf. on Very Large Databases, 1996, Bombay, India.
Berchtold S., Keim D., Kriegel H.-P.: ‘Fast Searching for Partial Similarity in Polygon Databases’, VLDB Journal, Dec. 1997.
Ciacia P., Patella M., Zezula P.: ‘A Cost Model for Similarity Queries in Metric Spaces’, Proc. ACM PODS Int. Conf. on Principals of Databases, Seattle, WA, 1998, pp. 59–68.
Cleary J. G.: ‘Analysis of an Algorithm for Finding Nearest Neighbors in Euclidean Space’, ACM Transactions on Mathematical Software, Vol. 5, No. 2, June 1979, pp.183–192.
Faloutsos C., Barber R., Flickner M., Hafner J., et al.: ‘Efficient and Effective Querying by Image Content’, Journal of Intelligent Information Systems, 1994, Vol. 3, pp. 231–262.
Friedman J. H., Bentley J. L., Finkel R. A.: “An Algorithm for Finding Best Matches in Logarithmic Expected Time”, ACM Transactions on Mathematical Software, Vol. 3, No. 3, September 1977, pp. 209–226.
Hjaltason G. R., Samet H.: ‘Ranking in Spatial Databases’, Proc. 4th Int. Symp. on Large Spatial Databases, Portland, ME, 1995, pp. 83–95.
Katayama N., Satoh S.: ‘The SR-Tree: An Index Structure for High-Dimensional Nearest Neighbor Queries’, Proc. ACM SIGMOD Int. Conf. on Management of Data, 1997.
Kukich K.: ‘Techniques for Automatically Correcting Words in Text’, ACM Computing Surveys, Vol. 24, No. 4, 1992, pp. 377–440.
Jagadish H. V.: ‘A Retrieval Technique for Similar Shapes’, Proc. ACM SIGMOD Int. Conf. on Management of Data, 1991, pp. 208–217.
Lin K., Jagadish H. V., Faloutsos C.: ‘The TV-tree: An Index Structure for High-Dimensional Data’, VLDB Journal, Vol. 3, 1995, pp. 517–542.
Mehrotra R., Gary J. E.: ‘Feature-Based Retrieval of Similar Shapes’, Proc. 9th Int. Conf. on Data Engineering, Vienna, Austria, 1993, pp. 108–115.
Mehrotra R., Gary J. E.: ‘Feature-Index-Based Similar Shape Retrieval’, Proc. of the 3rd Working Conf. on Visual Database Systems, March 1995.
Roussopoulos N., Kelley S., Vincent F.: ‘Nearest Neighbor Queries’, Proc. ACM SIGMOD Int. Conf. on Management of Data, 1995, pp. 71–79.
Shawney H., Hafner J.: ‘Efficient Color Histogram Indexing’, Proc. Int. Conf. on Image Processing, 1994, pp. 66–70.
Shoichet B. K., Bodian D. L., Kuntz I. D.: ‘Molecular Docking Using Shape Descriptors’, Journal of Computational Chemistry, Vol. 13, No. 3, 1992, pp. 380–397.
Sproull R.F.: ‘Refinements to Nearest Neighbor Searching in k-Dimensional Trees’, Algorithmica 1991, pp. 579–589.
Wallace T., Wintz P.: ‘An Efficient Three-Dimensional Aircraft Recognition Algorithm Using Normalized Fourier Descriptors’, Computer Graphics and Image Processing, Vol. 13, pp. 99–126, 1980.
Weber R., Schek H.-J., Blott S.: ‘A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces’, Proc. Int. Conf. on Very Large Databases, New York, 1998.
White, D., Jain R.: ‘Similarity Indexing with the SS-Tree’, Proc. 12th Int. Conf. on Data Engineering, New Orleans, LA, 1996, pp. 516–523.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Berchtold, S., Böhm, C., Keim, D., Krebs, F., Kriegel, HP. (2001). On Optimizing Nearest Neighbor Queries in High-Dimensional Data Spaces. In: Van den Bussche, J., Vianu, V. (eds) Database Theory — ICDT 2001. ICDT 2001. Lecture Notes in Computer Science, vol 1973. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44503-X_28
Download citation
DOI: https://doi.org/10.1007/3-540-44503-X_28
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41456-8
Online ISBN: 978-3-540-44503-6
eBook Packages: Springer Book Archive