Skip to main content

Similarity Search and Dimensionality Reduction: Not All Dimensions Are Equally Useful

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2973))

Abstract

Indexing high-dimensional data is a well known problem. Techniques for dimensionality reduction which map D-dimensional objects onto a d-dimensional space (dD) are often used to speedup similarity queries. In this paper we show that one can further improve query performance by initially overestimating the reduction, i.e., reducing the dimensionality of the space to D′ dimensions, where d < D′ < D, and, at query time, automatically choosing only d′, where d′ < d, dimensions to be used – that is, using only a few good dimensions after the initial reduction of the dimensionality. By incorporating this idea within a recently proposed technique, we can process range queries up to three times faster at the expense of limited storage overhead.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aggarwal, C.C., Hinneburg, A., Keim, D.A.: On the surprising behavior of distance metrics in high dimensional space. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 420–434. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  2. Agrawal, R., Faloutsos, C., Swami, A.N.: Efficient Similarity Search In Sequence Databases. In: Proc. of the 4th Intl. Conf. of Foundations of Data Organization and Algorithms, pp. 69–84 (1993)

    Google Scholar 

  3. Baeza-Yates, R., Cunto, W., Manber, U., Wu, S.: Proximity matching using fixed queries trees. In: Proc. of the 5th Symposium on Combinatorial Pattern Matching, pp. 198–212 (1994)

    Google Scholar 

  4. Berchtold, S., Keim, D.A., Kriegel, H.-P.: The X-tree: An index structure for high-dimensional data. In: Proc. of the 22nd Intl. Conf. on Very Large Databases, pp. 28–39 (1996)

    Google Scholar 

  5. Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is “nearest neighbor” meaningful? In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  6. Bozkaya, T., Ozsoyoglu, M.: Distance-based indexing for high-dimensional metric spaces. In: Proc. of the 1997 ACM SIGMOD Intl. Conf. on Management of Data, pp. 357–368 (1997)

    Google Scholar 

  7. Braunmuller, B., Ester, M., Kriegel, H.-P., Sander, J.: Multiple similarity queries: A basic DBMS operation for mining in metric databases. IEEE Transactions on Knowledge and Data Engineering 13(1), 79–95 (2001)

    Article  Google Scholar 

  8. Ciaccia, P., Patella, M., Zezula, P.: M-tree: An efficient access method for similarity search in metric spaces. In: Proc. of 23rd Intl. Conf. on Very Large Data Bases, pp. 426–435 (1997)

    Google Scholar 

  9. Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proc. of the 2nd Intl. Conf. on Knowledge Discovery and Data Mining, pp. 226–231 (1996)

    Google Scholar 

  10. Santos Filho, R.F., Traina, A., Traina Jr., C., Faloutsos, C.: Similarity search without tears: The OMNI family of all-purpose access methods. In: Proc. of the 17th Intl. Conf. on Data Engineering, pp. 623–630 (2001)

    Google Scholar 

  11. Guttman, A.: R-trees: A dynamic index structure for spatial searching. In: Proc. of ACM SIGMOD Intl. Conf. on Management of Data, pp. 47–57 (1984)

    Google Scholar 

  12. Traina Jr., C., Traina, A., Faloutsos, C., Seeger, B.: Fast indexing and visualization for metric data sets using slim-trees. IEEE Transactions on Knowledge and Data Engineering 14(2), 244–260 (2002)

    Article  Google Scholar 

  13. Katayama, N., Satoh, S.: The SR-tree: An index structure for high-dimensional nearest neighbor queries. In: Proc. of the ACM SIGMOD Intl. Conf. on Management of Data, pp. 369–380 (1997)

    Google Scholar 

  14. Liao, L., Noble, W.S.: Combining pairwise sequence similarity and support vector machinesfor remote protein homology detection. In: Proc. of the 6th Intl. Conf. on Research in Computational Molecular Biology, pp. 225–232 (2002)

    Google Scholar 

  15. Lu, G.: Multimedia Database Management Systems. Artech House, Norwood (1999)

    MATH  Google Scholar 

  16. Sakurai, Y., Yoshikawa, M., Uemura, S., Kojima, H.: The A-tree: An index structure for high-dimensional spaces using relative approximation. In: Proc. of the 26th Intl. Conf. on Very Large Data Bases, pp. 516–526 (2000)

    Google Scholar 

  17. Stehling, R.O., Nascimento, M.A., Falcão, A.X.: MiCRoM: A metric distance to compare segmented images. In: Proc. of the 2002 Visual Information Systems Conf., pp. 12–23 (2002)

    Google Scholar 

  18. Stehling, R.O., Nascimento, M.A., Falcão, A.X.: A compact and efficient image retrieval approach based on border/interior pixel classification. In: Proc. of the 2002 ACM Intl. Conf. on Information and Knowledge Management, pp. 102–109 (2002)

    Google Scholar 

  19. Swain, M.J., Ballard, D.H.: Color indexing. Intl. Journal of Computer Vision 7(1), 11–32 (1991)

    Article  Google Scholar 

  20. Weber, R., Schek, H.-J., Blott, S.: A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: Proc. of the 24th Intl. Conf. on Very Large Databases, pp. 194–205 (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Digout, C., Nascimento, M.A., Coman, A. (2004). Similarity Search and Dimensionality Reduction: Not All Dimensions Are Equally Useful. In: Lee, Y., Li, J., Whang, KY., Lee, D. (eds) Database Systems for Advanced Applications. DASFAA 2004. Lecture Notes in Computer Science, vol 2973. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24571-1_73

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-24571-1_73

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-21047-4

  • Online ISBN: 978-3-540-24571-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics