ABSTRACT
High-dimensional clustering is used by some content-based image retrieval systems to partition the data into groups; the groups (clusters) are then indexed to accelerate processing of queries. Recently, the Cluster Pruning approach was proposed as a simple way to produce such clusters. While the original evaluation of the algorithm was performed within a text indexing context at a rather small scale, its simplicity motivated us to study its behavior in an image indexing context at a much larger scale. This paper summarizes the results of this study and shows that while the basic algorithm works fairly well, three extensions dramatically improve its performance and scalability, accelerating both query processing and the construction of clusters, making Cluster Pruning a promising basis for building large-scale systems that require a clustering algorithm.
- F. Chierichetti, A. Panconesi, P. Raghavan, M. Sozio, A. Tiberi, and E. Upfal. Finding near neighbors through cluster pruning. In Proc. PODS, 2007. Google ScholarDigital Library
- F. Fraundorfer, H. Stewénius, and D. Nistér. A binning scheme for fast hard drive based image search. In Proc. CVPR, 2007.Google ScholarCross Ref
- Y. Ke and R. Sukthankar. PCA-SIFT: A more distinctive representation for local image descriptors. In Proc. CVPR, 2004. Google ScholarDigital Library
- Y. Ke, R. Sukthankar, and L. Huston. Efficient near-duplicate detection and sub-image retrieval. In Proc. ACM Multimedia, 2004. Google ScholarDigital Library
- H. Lejsek, F. H. Ásmundsson, B. T. Jónsson, and L. Amsaleg. Scalability of local image descriptors: a comparative study. In Proc. ACM Multimedia, 2006. Google ScholarDigital Library
- D. G. Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 60(2):91--110, 2004. Google ScholarDigital Library
- D. Nistér and H. Stewénius. Scalable recognition with a vocabulary tree. In Proc. CVPR, 2006. Google ScholarDigital Library
- J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Object retrieval with large vocabularies and fast spatial matching. In Proc. CVPR, 2007.Google ScholarCross Ref
- J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Lost in quantization: Improving particular object retrieval in large scale image databases. In Proc. CVPR, 2008.Google ScholarCross Ref
- R. Sigurdardottir, H. Hauksson, B. T. Jónsson, and L. Amsaleg. A case study of the quality vs. time trade-off for approximate image descriptor search. In Proc. IEEE EMMA workshop, 2005. Google ScholarDigital Library
- J. Sivic and A. Zisserman. Video google: A text retrieval approach to object matching in videos. In Proc. ICCV, 2003. Google ScholarDigital Library
Index Terms
- A large-scale performance study of cluster-based high-dimensional indexing
Recommendations
Efficient Density-Based Subspace Clustering in High Dimensions
Revised Selected Papers of the First International Workshop on Clustering High--Dimensional Data - Volume 7627Density-based clustering defines clusters as dense areas in feature space separated by sparsely populated areas. It is known to successfully identify clusters of arbitrary shapes even in noisy data. Today, we face increasingly high-dimensional data, ...
Towards Meaningful High-Dimensional Nearest Neighbor Search by Human-Computer Interaction
ICDE '02: Proceedings of the 18th International Conference on Data EngineeringNearest Neighbor search is an important and widely used problem in a number of important application domains. In many of these domains, the dimensionality of the data representation is often very high. Recent theoretical results have shown that the ...
Scalable visual assessment of cluster tendency for large data sets
The problem of determining whether clusters are present in a data set (i.e., assessment of cluster tendency) is an important first step in cluster analysis. The visual assessment of cluster tendency (VAT) tool has been successful in determining ...
Comments