ABSTRACT
Hash tables have been proposed for the indexing of high-dimensional binary vectors, specifically for the identification of media by fingerprints. In this paper we develop a new model to predict the performance of a hash-based method (Fingerprint Hashing) under varying levels of noise. We show that by the adjustment of two parameters, robustness to a higher level of noise is achieved. We extend Fingerprint Hashing to a multi-table range search (Extended Fingerprint Hashing) and show this approach also increases robustness to noise. We then show the relationship between Extended Fingerprint Hashing and Locality Sensitive Hashing and investigate design choices for dealing with higher noise levels. If index size must be held constant, the Extended Fingerprint Hash is a superior method. We also show that to achieve similar performance at a given level of noise a Locality Sensitive Hash requires nearly a six-fold increase in index size which is likely to be impractical for many applications.
- Audible magic. (Online). Available: {http://www.audiblemagic.com}.Google Scholar
- A. Andoni and P. Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Communications of the ACM, 51(1):117--122, 2008. Google ScholarDigital Library
- J. L. Bentley. Multidimensional divide-and-conquer. Communications of the ACM, 23(4):214--229, 1980. Google ScholarDigital Library
- A. Califano and I. Rigoutsos. Flash: a fast look-up algorithm for string homology. In Proc. CVPR, pages 353--359, jun. 1993.Google ScholarCross Ref
- P. Cano, E. Batlle, T. Kalker, and J. Haitsma. A review of audio fingerprinting. The Journal of VLSI Signal Processing, 41(3):271--284, 2005. Google ScholarDigital Library
- P. Cano, E. Batlle, H. Mayer, and H. Neuschmied. Robust sound modeling for song detection in broadcast audio. In Proc. 112th Int. Conv. of the AES, 2002.Google Scholar
- E. Chávez, G. Navarro, R. Baeza-Yates, and J. L. Marroquín. Searching in metric spaces. ACM Computing Surveys, 33(3):273--321, 2001. Google ScholarDigital Library
- K. L. Clarkson. Nearest-neighbor searching and metric space dimensions. In G. Shakhnarovich, T. Darrell, and P. Indyk, editors, Nearest-Neighbor Methods for Learning and Vision: Theory and Practice, pages 15--59. MIT Press, 2006.Google Scholar
- J. Haitsma and T. Kalker. A highly robust audio fingerprinting system. In Proc. of the Int. Symposium on Music Information Retrieval, pages 107--115, 2002.Google Scholar
- P. Indyk and R. Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. In Proc. of the 30th Annual ACM Symposium on Theory of Computing, pages 604--613, New York, NY, USA, 1998. ACM. Google ScholarDigital Library
- R. Karp, O. Waarts, and G. Zweig. The bit vector intersection problem. In Proc. of the 36th Annual Symposium on the Foundations of Computer Science, pages 621--630, oct. 1995. Google ScholarDigital Library
- J. M. Kleinberg. Two algorithms for nearest-neighbor search in high dimensions. In Proc. of the 29th Annual ACM Symposium on Theory of Computing, pages 599--608, New York, NY, USA, 1997. ACM. Google ScholarDigital Library
- F. Kurth, A. Ribbrock, and M. Clausen. Identification of highly distorted audio material for querying large scale data bases. In Proc. 112th Audio Engineering Society Convention, 2002.Google Scholar
- E. Kushilevitz, R. Ostrovsky, and Y. Rabani. Efficient search for approximate nearest neighbor in high dimensional spaces. In Proc. of the 30th Annual ACM Symposium on Theory of Computing, pages 614--623, New York, NY, USA, 1998. ACM. Google ScholarDigital Library
- C. D. Manning, P. Raghavan, and H. Schütze. Introduction to information retrieval. Cambridge University Press, Cambridge, 2008. Google ScholarCross Ref
- M. Muja and D. G. Lowe. Fast approximate nearest neighbors with automatic algorithm configuration. In VISSAPP '09: Proc. of the Int. Conf. on Computer Vision Theory and Application, pages 331--340. INSTICC Press, 2009.Google Scholar
- P. N. Yianilos. Data structures and algorithms for nearest neighbor search in general metric spaces. In SODA '93: Proceedings of the 4th annual ACM-SIAM Symposium on Discrete algorithms, pages 311--321, Philadelphia, PA, USA, 1993. Society for Industrial and Applied Mathematics. Google ScholarDigital Library
- P. N. Yianilos. Locally lifting the curse of dimensionality for nearest neighbor search (extended abstract). In SODA '00: Proc. of the 11th annual ACM-SIAM symposium on Discrete algorithms, pages 361--370, Philadelphia, PA, USA, 2000. Society for Industrial and Applied Mathematics. Google ScholarDigital Library
- J. Zobel and A. Moffat. Inverted files for text search engines. ACM Computing Surveys, 38(2):6, 2006. Google ScholarDigital Library
Index Terms
- A comparison of extended fingerprint hashing and locality sensitive hashing for binary audio fingerprints
Recommendations
A posteriori multi-probe locality sensitive hashing
MM '08: Proceedings of the 16th ACM international conference on MultimediaEfficient high-dimensional similarity search structures are essential for building scalable content-based search systems on feature-rich multimedia data. In the last decade, Locality Sensitive Hashing (LSH) has been proposed as indexing technique for ...
Data-Dependent Locality Sensitive Hashing
Proceedings of the 15th Pacific-Rim Conference on Advances in Multimedia Information Processing --- PCM 2014 - Volume 8879Locality sensitive hashing LSH is the most popular algorithm for approximate nearest neighbor ANN search. As LSH partitions vector space uniformly and the distribution of vectors is usually non-uniform, it poorly fits real dataset and has limited ...
Intelligent probing for locality sensitive hashing: multi-probe LSH and beyond
The past decade has been marked by the (continued) explosion of diverse data content and the fast development of intelligent data analytics techniques. One problem we identified in the mid-2000s was similarity search of feature-rich data. The challenge ...
Comments