Skip to main content

High-Dimensional Similarity Search Using Data-Sensitive Space Partitioning

  • Conference paper
Database and Expert Systems Applications (DEXA 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4080))

Included in the following conference series:

Abstract

Nearest neighbor search has a wide variety of applications. Unfortunately, the majority of search methods do not scale well with dimensionality. Recent efforts have been focused on finding better approximate solutions that improve the locality of data using dimensionality reduction. However, it is possible to preserve the locality of data and find exact nearest neighbors in high dimensions without dimensionality reduction. This paper introduces a novel high-performance technique to find exact k-nearest neighbors in both low and high dimensional spaces. It relies on a new method for data-sensitive space partitioning based on explicit data clustering, which is introduced in the paper for the first time. This organization supports effective reduction of the search space before accessing secondary storage. Costly Euclidean distance calculations are reduced through efficient processing of a lightweight memory-based filter. The algorithm outperforms sequential scan and the VA-File in high-dimensional situations. Moreover, the results with dynamic loading of data show that the technique works well on dynamic datasets as well.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aggarwal, C.C.: On the effects of dimensionality reduction on high dimensional similarity search. In: Proc. 20th PODS Conf. pp. 256–266 (2001)

    Google Scholar 

  2. Aggarwal, C.C.: Hierarchical subspace sampling: A unified framework for high dimensional data reduction, selectivity estimation and nearest neighbor search. In: Proc. ACM SIGMOD Conf., pp. 452–463 (2002)

    Google Scholar 

  3. Berchtold, S., Ertl, B., Keim, D., Kriegel, H.P., Seidl, T.: Fast nearest neighbor search in high-dimensional space. In: Proc. 14th ICDE Int. Conf. on Data Engineering, pp. 209–218 (1998)

    Google Scholar 

  4. Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is nearest neighbor meaningful? In: Beeri, C., Bruneman, P. (eds.) ICDT 1999, vol. 1540, pp. 217–235. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  5. Blott, S., Weber, R.: A simple Vector-Approximation file for similarity search in high-dimensional vector spaces. Technical report, Esprit Project Hermes (no. 9141) (1997)

    Google Scholar 

  6. Fagin, R., Kumar, R., Shivakumar, D.: Efficient similarity search and classification via rank aggregation. In: Proc. ACM SIGMOD Conf., pp. 301–312 (2003)

    Google Scholar 

  7. Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimension via hashing. In: Proc. 25th VLDB Conf., pp. 518–529 (1999)

    Google Scholar 

  8. Hinneburg, A., Aggarwal, C.C., Keim, D.A.: What is nearest neighbor in high dimensional spaces? In: Proc. 26th VLDB Conf., pp. 506–515 (2000)

    Google Scholar 

  9. Katayama, N., Satoh, S.: The SR-tree: An index structure for high-dimensional nearest neighbor queries. SIGMOD Record 26(2), 369–380 (1997)

    Article  Google Scholar 

  10. MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proc. 5th Berkeley Symp. Math. Statist. Prob., vol. 1, pp. 281–297 (1967)

    Google Scholar 

  11. Orlandic, R., Lukaszuk, J., Swietlik, C.: The design of a retrieval technique for high-dimensional data on tertiary storage. SIGMOD Record 31(2), 15–21 (2002)

    Article  Google Scholar 

  12. Orlandic, R., Lukaszuk, J.: Efficient high-dimensional indexing by superimposing space-partitioning schemes. In: Proc. 8th International Database Engineering & Applications Symposium IDEAS 2004, pp. 257–264 (2004)

    Google Scholar 

  13. Orlandic, R., Lai, Y., Yee, W.G.: Clustering high-dimensional data using an efficient and effective data space reduction. In: Proc. ACM Conference on Information and Knowledge Management CIKM 2005, pp. 201–208 (2005)

    Google Scholar 

  14. Robinson, J.T.: The K-D-B-Tree: A search structure for large multidimensional dynamic Indexes. In: Proc. ACM SIGMOD Conf., pp. 10–18 (1981)

    Google Scholar 

  15. Sakurai, Y., Yoshikawa, M., Uemura, S., Kojima, H.: The A-tree: An index structure for high-dimensional spaces using relative approximation. In: Proc. 26th VLDB Conf., pp. 516–526 (2000)

    Google Scholar 

  16. Seidl, T., Kriegel, H.P.: Optimal multi-Step k-nearest neighbor search. In: Proc. ACM SIGMOD Conf., pp. 154–165 (1998)

    Google Scholar 

  17. Weber, R., Schek, H.J., Blott, S.: A quantitative analysis and performance study for similarity search methods in high-dimensional spaces. In: Proc. 24th VLDB Conf., pp. 194–205 (1998)

    Google Scholar 

  18. Weber, R., Zezula, P.: The theory and practice of similarity searches in high dimensional data spaces (extended abstract). In: 4th DELOS Workshop (1997)

    Google Scholar 

  19. Yu, C., Ooi, B.C., Tan, K.L., Jagadish, H.V.: Indexing the distance: An efficient method to KNN processing. In: Proc. 26th VLDB Conf., pp. 421–430 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kulkarni, S., Orlandic, R. (2006). High-Dimensional Similarity Search Using Data-Sensitive Space Partitioning. In: Bressan, S., Küng, J., Wagner, R. (eds) Database and Expert Systems Applications. DEXA 2006. Lecture Notes in Computer Science, vol 4080. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11827405_72

Download citation

  • DOI: https://doi.org/10.1007/11827405_72

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-37871-6

  • Online ISBN: 978-3-540-37872-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics