research-article

A large-scale performance study of cluster-based high-dimensional indexing

Authors:
Gylfi Þór Gudmundsson

INRIA Rennes, Rennes, France

INRIA Rennes, Rennes, France
View Profile

,
Björn Þór Jónsson

Reykjavik University, Reykjavik, Iceland

Reykjavik University, Reykjavik, Iceland
View Profile

,
Laurent Amsaleg

CNRS - IRISA, Rennes, France

CNRS - IRISA, Rennes, France
View Profile

VLS-MCMR '10: Proceedings of the international workshop on Very-large-scale multimedia corpus, mining and retrievalOctober 2010Pages 31–36https://doi.org/10.1145/1878137.1878145

Published:29 October 2010Publication History

VLS-MCMR '10: Proceedings of the international workshop on Very-large-scale multimedia corpus, mining and retrieval

Pages 31–36

ABSTRACT

High-dimensional clustering is used by some content-based image retrieval systems to partition the data into groups; the groups (clusters) are then indexed to accelerate processing of queries. Recently, the Cluster Pruning approach was proposed as a simple way to produce such clusters. While the original evaluation of the algorithm was performed within a text indexing context at a rather small scale, its simplicity motivated us to study its behavior in an image indexing context at a much larger scale. This paper summarizes the results of this study and shows that while the basic algorithm works fairly well, three extensions dramatically improve its performance and scalability, accelerating both query processing and the construction of clusters, making Cluster Pruning a promising basis for building large-scale systems that require a clustering algorithm.

References

F. Chierichetti, A. Panconesi, P. Raghavan, M. Sozio, A. Tiberi, and E. Upfal. Finding near neighbors through cluster pruning. In Proc. PODS, 2007. Google ScholarDigital Library
F. Fraundorfer, H. Stewénius, and D. Nistér. A binning scheme for fast hard drive based image search. In Proc. CVPR, 2007.Google ScholarCross Ref
Y. Ke and R. Sukthankar. PCA-SIFT: A more distinctive representation for local image descriptors. In Proc. CVPR, 2004. Google ScholarDigital Library
Y. Ke, R. Sukthankar, and L. Huston. Efficient near-duplicate detection and sub-image retrieval. In Proc. ACM Multimedia, 2004. Google ScholarDigital Library
H. Lejsek, F. H. Ásmundsson, B. T. Jónsson, and L. Amsaleg. Scalability of local image descriptors: a comparative study. In Proc. ACM Multimedia, 2006. Google ScholarDigital Library
D. G. Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 60(2):91--110, 2004. Google ScholarDigital Library
D. Nistér and H. Stewénius. Scalable recognition with a vocabulary tree. In Proc. CVPR, 2006. Google ScholarDigital Library
J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Object retrieval with large vocabularies and fast spatial matching. In Proc. CVPR, 2007.Google ScholarCross Ref
J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Lost in quantization: Improving particular object retrieval in large scale image databases. In Proc. CVPR, 2008.Google ScholarCross Ref
R. Sigurdardottir, H. Hauksson, B. T. Jónsson, and L. Amsaleg. A case study of the quality vs. time trade-off for approximate image descriptor search. In Proc. IEEE EMMA workshop, 2005. Google ScholarDigital Library
J. Sivic and A. Zisserman. Video google: A text retrieval approach to object matching in videos. In Proc. ICCV, 2003. Google ScholarDigital Library

Index Terms

A large-scale performance study of cluster-based high-dimensional indexing
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Clustering and classification
  2. Information systems applications
    1. Data mining
      1. Clustering

Recommendations

Efficient Density-Based Subspace Clustering in High Dimensions
Revised Selected Papers of the First International Workshop on Clustering High--Dimensional Data - Volume 7627

Density-based clustering defines clusters as dense areas in feature space separated by sparsely populated areas. It is known to successfully identify clusters of arbitrary shapes even in noisy data. Today, we face increasingly high-dimensional data, ...
Read More
Towards Meaningful High-Dimensional Nearest Neighbor Search by Human-Computer Interaction
ICDE '02: Proceedings of the 18th International Conference on Data Engineering

Nearest Neighbor search is an important and widely used problem in a number of important application domains. In many of these domains, the dimensionality of the data representation is often very high. Recent theoretical results have shown that the ...
Read More
Scalable visual assessment of cluster tendency for large data sets

The problem of determining whether clusters are present in a data set (i.e., assessment of cluster tendency) is an important first step in cluster analysis. The visual assessment of cluster tendency (VAT) tool has been successful in determining ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
VLS-MCMR '10: Proceedings of the international workshop on Very-large-scale multimedia corpus, mining and retrieval
October 2010
68 pages
ISBN:9781450301664
DOI:10.1145/1878137
Program Chairs:
Benoit Huet
EURECOM, France
,
Tat-Seng Chua
National University of Singapore, Singapore
,
Alexander Hauptmann
Carnegie Mellon University, USA
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 29 October 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
approximate search method
high dimensional
nearest neighbor
scalability
Qualifiers
- research-article
Conference
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 24
  Total Citations
  View Citations
- 252
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A large-scale performance study of cluster-based high-dimensional indexing

VLS-MCMR '10: Proceedings of the international workshop on Very-large-scale multimedia corpus, mining and retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Efficient Density-Based Subspace Clustering in High Dimensions

Towards Meaningful High-Dimensional Nearest Neighbor Search by Human-Computer Interaction

Scalable visual assessment of cluster tendency for large data sets

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A large-scale performance study of cluster-based high-dimensional indexing

VLS-MCMR '10: Proceedings of the international workshop on Very-large-scale multimedia corpus, mining and retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Efficient Density-Based Subspace Clustering in High Dimensions

Towards Meaningful High-Dimensional Nearest Neighbor Search by Human-Computer Interaction

Scalable visual assessment of cluster tendency for large data sets

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media