Similarity Search and Dimensionality Reduction: Not All Dimensions Are Equally Useful

Digout, Christian; Nascimento, Mario A.; Coman, Alexandru

doi:10.1007/978-3-540-24571-1_73

Similarity Search and Dimensionality Reduction: Not All Dimensions Are Equally Useful

Christian Digout⁸,
Mario A. Nascimento⁸ &
Alexandru Coman⁸

Conference paper

977 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2973))

Abstract

Indexing high-dimensional data is a well known problem. Techniques for dimensionality reduction which map D-dimensional objects onto a d-dimensional space (d ≪ D) are often used to speedup similarity queries. In this paper we show that one can further improve query performance by initially overestimating the reduction, i.e., reducing the dimensionality of the space to D′ dimensions, where d < D′ < D, and, at query time, automatically choosing only d′, where d′ < d, dimensions to be used – that is, using only a few good dimensions after the initial reduction of the dimensionality. By incorporating this idea within a recently proposed technique, we can process range queries up to three times faster at the expense of limited storage overhead.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aggarwal, C.C., Hinneburg, A., Keim, D.A.: On the surprising behavior of distance metrics in high dimensional space. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 420–434. Springer, Heidelberg (2000)
Chapter Google Scholar
Agrawal, R., Faloutsos, C., Swami, A.N.: Efficient Similarity Search In Sequence Databases. In: Proc. of the 4th Intl. Conf. of Foundations of Data Organization and Algorithms, pp. 69–84 (1993)
Google Scholar
Baeza-Yates, R., Cunto, W., Manber, U., Wu, S.: Proximity matching using fixed queries trees. In: Proc. of the 5th Symposium on Combinatorial Pattern Matching, pp. 198–212 (1994)
Google Scholar
Berchtold, S., Keim, D.A., Kriegel, H.-P.: The X-tree: An index structure for high-dimensional data. In: Proc. of the 22nd Intl. Conf. on Very Large Databases, pp. 28–39 (1996)
Google Scholar
Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is “nearest neighbor” meaningful? In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1998)
Chapter Google Scholar
Bozkaya, T., Ozsoyoglu, M.: Distance-based indexing for high-dimensional metric spaces. In: Proc. of the 1997 ACM SIGMOD Intl. Conf. on Management of Data, pp. 357–368 (1997)
Google Scholar
Braunmuller, B., Ester, M., Kriegel, H.-P., Sander, J.: Multiple similarity queries: A basic DBMS operation for mining in metric databases. IEEE Transactions on Knowledge and Data Engineering 13(1), 79–95 (2001)
Article Google Scholar
Ciaccia, P., Patella, M., Zezula, P.: M-tree: An efficient access method for similarity search in metric spaces. In: Proc. of 23rd Intl. Conf. on Very Large Data Bases, pp. 426–435 (1997)
Google Scholar
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proc. of the 2nd Intl. Conf. on Knowledge Discovery and Data Mining, pp. 226–231 (1996)
Google Scholar
Santos Filho, R.F., Traina, A., Traina Jr., C., Faloutsos, C.: Similarity search without tears: The OMNI family of all-purpose access methods. In: Proc. of the 17th Intl. Conf. on Data Engineering, pp. 623–630 (2001)
Google Scholar
Guttman, A.: R-trees: A dynamic index structure for spatial searching. In: Proc. of ACM SIGMOD Intl. Conf. on Management of Data, pp. 47–57 (1984)
Google Scholar
Traina Jr., C., Traina, A., Faloutsos, C., Seeger, B.: Fast indexing and visualization for metric data sets using slim-trees. IEEE Transactions on Knowledge and Data Engineering 14(2), 244–260 (2002)
Article Google Scholar
Katayama, N., Satoh, S.: The SR-tree: An index structure for high-dimensional nearest neighbor queries. In: Proc. of the ACM SIGMOD Intl. Conf. on Management of Data, pp. 369–380 (1997)
Google Scholar
Liao, L., Noble, W.S.: Combining pairwise sequence similarity and support vector machinesfor remote protein homology detection. In: Proc. of the 6th Intl. Conf. on Research in Computational Molecular Biology, pp. 225–232 (2002)
Google Scholar
Lu, G.: Multimedia Database Management Systems. Artech House, Norwood (1999)
MATH Google Scholar
Sakurai, Y., Yoshikawa, M., Uemura, S., Kojima, H.: The A-tree: An index structure for high-dimensional spaces using relative approximation. In: Proc. of the 26th Intl. Conf. on Very Large Data Bases, pp. 516–526 (2000)
Google Scholar
Stehling, R.O., Nascimento, M.A., Falcão, A.X.: MiCRoM: A metric distance to compare segmented images. In: Proc. of the 2002 Visual Information Systems Conf., pp. 12–23 (2002)
Google Scholar
Stehling, R.O., Nascimento, M.A., Falcão, A.X.: A compact and efficient image retrieval approach based on border/interior pixel classification. In: Proc. of the 2002 ACM Intl. Conf. on Information and Knowledge Management, pp. 102–109 (2002)
Google Scholar
Swain, M.J., Ballard, D.H.: Color indexing. Intl. Journal of Computer Vision 7(1), 11–32 (1991)
Article Google Scholar
Weber, R., Schek, H.-J., Blott, S.: A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: Proc. of the 24th Intl. Conf. on Very Large Databases, pp. 194–205 (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computing Science, Univ. of Alberta, Canada
Christian Digout, Mario A. Nascimento & Alexandru Coman

Authors

Christian Digout
View author publications
You can also search for this author in PubMed Google Scholar
Mario A. Nascimento
View author publications
You can also search for this author in PubMed Google Scholar
Alexandru Coman
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, KAIST, 373-1 Guseong-dong Yuseong-gu, 305-701, Daejeon, Korea
YoonJoon Lee
School of Computer Science and Technology, Heilongjiang University, China
Jianzhong Li
Computer Science Department and, Advanced Information Technology Research Center(AITrc), KAIST, Korea
Kyu-Young Whang
Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, 373-1 Guseong-dong, Yuseong-gu, 305-701, Daejeon, Republic of Korea
Doheon Lee

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Digout, C., Nascimento, M.A., Coman, A. (2004). Similarity Search and Dimensionality Reduction: Not All Dimensions Are Equally Useful. In: Lee, Y., Li, J., Whang, KY., Lee, D. (eds) Database Systems for Advanced Applications. DASFAA 2004. Lecture Notes in Computer Science, vol 2973. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24571-1_73

Download citation

DOI: https://doi.org/10.1007/978-3-540-24571-1_73
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21047-4
Online ISBN: 978-3-540-24571-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics