Skip to main content
Log in

Fast filtering false active subspaces for efficient high dimensional similarity processing

  • Published:
Science in China Series F: Information Sciences Aims and scope Submit manuscript

Abstract

The query space of a similarity query is usually narrowed down by pruning inactive query subspaces which contain no query results and keeping active query subspaces which may contain objects corresponding to the request. However, some active query subspaces may contain no query results at all, those are called false active query subspaces. It is obvious that the performance of query processing degrades in the presence of false active query subspaces. Our experiments show that this problem becomes seriously when the data are high dimensional and the number of accesses to false active subspaces increases as the dimensionality increases. In order to solve this problem, this paper proposes a space mapping approach to reducing such unnecessary accesses. A given query space can be refined by filtering within its mapped space. To do so, a mapping strategy called maxgap is proposed to improve the efficiency of the refinement processing. Based on the mapping strategy, an index structure called MS-tree and algorithms of query processing are presented in this paper. Finally, the performance of MS-tree is compared with that of other competitors in terms of range queries on a real data set.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Böhm C, Berchtold S, Keim D A. Searching in high-dimensional spaces-index structures for improving the performance of multimedia databases. ACM Comput Surv, 2001, 33(3): 322–373

    Article  Google Scholar 

  2. Berkmann N, Krigel H P, Schneider R, et al. The R*-tree: an efficient and robust access method for points and rectangles. SIGMOD Record, 1990, 19(2): 322–331

    Article  Google Scholar 

  3. Katayama N, Satoh S. The SR-tree: an index structure for high-dimensional nearest meighbor queries. SIGMOD Record, 1997, 26(2): 369–380

    Article  Google Scholar 

  4. Lin K I, Jagadish H V, Faloutsos C. The TV-tree: an index structure for high-dimensional data. VLDB J, 1994, 3(4): 517–542

    Article  Google Scholar 

  5. White D A, Jain R. Similarity indexing with the SS-tree. In: Proceedings of the 12th ICDE Conference. Washington: IEEE Computer Society, 1996. 516–523

    Google Scholar 

  6. Cha G H, Chung C W. The GC-tree: a high-dimensional index structure for similarity search in image databases. IEEE Trans Multimedia, 2002, 4(2): 235–247

    Article  Google Scholar 

  7. Bozkaya T, Ozsoyoglu M. Distance-based indexing for high-dimensional metric spaces. SIGMOD Record, 1997, 26(2): 357–368

    Article  Google Scholar 

  8. Ciaccia P, Patella M, Zezula P. M-tree: an efficient access method for similarity search in metric spaces. In: Proceedings of the 23rd VLDB Conference. San Fransisco: Morgan Kaufmann, 1997. 426–435

    Google Scholar 

  9. Skopal T, Pokorny J, Kratky M, et al. Revisiting M-tree building principles. In: Proceedings of the 7th ADBIS Conference. Berlin: Springer-Verlag, 2003. 148–162

    Google Scholar 

  10. Ishikawa M, Chen H, Furuse K, et al. MB+tree: a dynamically updatable metric index for similarity searches. In: Proceedings of the first WAIM Conference. Berlin: Springer-Verlag, 2000. 356–373

    Google Scholar 

  11. Chakrabarti K, Mehrotra S. Local dimensionality reduction: a new approach to indexing high dimensional spaces. In: Proceedings of the 26th VLDB Conference. San Fransisco: Morgan Kaufmann, 2000. 89–100

    Google Scholar 

  12. Zhou X, Wang G, Yu J X, et al. M+-tree: a new dynamical multidimensional index for metric spaces. In: Proceedings of the 14th Australasian Database Conference. Sydney: Australian Computer Society, 2003. 161–168

    Google Scholar 

  13. Cui B, Ooi B C, Su J, et al. Contorting high dimensional data for efficient main memory processing. In: Proceedings of the 2003 ACM SIGMOD Conference. New York: ACM Press, 2003. 479–490

    Chapter  Google Scholar 

  14. Uhlmann J K. Satisfying general proximity/similarity queries with metric trees. Inform Process Lett, 1991, 40(4): 175–179

    Article  MATH  Google Scholar 

  15. Yu G, Kaneko K, Bai G, et al. Transaction management for a distributed object storage system WAKSHI-design, implementation and performance. In: Proceedings of the 12th ICDE Conference. Washington: IEEE Computer Society, 1996. 460–468

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to GuoRen Wang.

Additional information

Supported by National Basic Research Program of China (Grant No. 2006CB303103), the National Natural Science Foundation of China (Grant Nos. 60873011, 60802026, 60773219, 60773021) and the High Technology Program (Grant No. 2007AA01Z192)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, G., Yu, G., Xin, J. et al. Fast filtering false active subspaces for efficient high dimensional similarity processing. Sci. China Ser. F-Inf. Sci. 52, 286–294 (2009). https://doi.org/10.1007/s11432-009-0051-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11432-009-0051-7

Keywords

Navigation