Skip to main content
Log in

Active hashing and its application to image and text retrieval

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

In recent years, hashing-based methods for large-scale similarity search have sparked considerable research interests in the data mining and machine learning communities. While unsupervised hashing-based methods have achieved promising successes for metric similarity, they cannot handle semantic similarity which is usually given in the form of labeled point pairs. To overcome this limitation, some attempts have recently been made on semi-supervised hashing which aims at learning hash functions from both metric and semantic similarity simultaneously. Existing semi-supervised hashing methods can be regarded as passive hashing since they assume that the labeled pairs are provided in advance. In this paper, we propose a novel framework, called active hashing, which can actively select the most informative labeled pairs for hash function learning. Specifically, it identifies the most informative points to label and constructs labeled pairs accordingly. Under this framework, we use data uncertainty as a measure of informativeness and develop a batch mode algorithm to speed up active selection. We empirically compare our method with a state-of-the-art passive hashing method on two benchmark data sets, showing that the proposed method can reduce labeling cost as well as overcome the limitations of passive hashing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Andoni A, Indyk P (2006) Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In: Proceedings of the 47th annual IEEE symposium on foundations of computer science, FOCS ’06, IEEE Computer Society, Washington, pp 459–468

  • Angluin D (1988) Queries and concept learning. Mach Learn 2(4): 319–342

    Google Scholar 

  • Arya S, Mount DM, Netanyahu NS, Silverman R, Wu AY (1998) An optimal algorithm for approximate nearest neighbor searching fixed dimensions. J ACM 45(6): 891–923

    Article  MathSciNet  MATH  Google Scholar 

  • Atkinson AC, Donev A (1992) Optimum experimental designs. Oxford University Press, New York, NY

    MATH  Google Scholar 

  • Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge, UK

    MATH  Google Scholar 

  • Bronstein MM, Bronstein AM, Michel F, Paragios N (2010) Data fusion through cross-modality metric learning using similarity-sensitive hashing. In: Proceedings of IEEE conference on computer vision and pattern recognition [46], pp 3594–3601

  • Cohn D, Atlas L, Ladner R (1994) Improving generalization with active learning. Mach Learn 15(2): 201–221

    Google Scholar 

  • Eshghi K, Rajaram S (2008) Locality sensitive hash functions based on concomitant rank order statistics. In: Proceeding of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’08, ACM, New York, pp 221–229

  • Freund Y, Iyer R, Schapire RE, Singer Y (2003) An efficient boosting algorithm for combining preferences. J Mach Learn Res 4: 933–969

    MathSciNet  Google Scholar 

  • Friedman JH, Bentley JL, Finkel RA (1977) An algorithm for finding best matches in logarithmic expected time. ACM Transac Math Softw 3(3): 209–226

    Article  MATH  Google Scholar 

  • Guo Y, Greiner R (2007) Optimistic active learning using mutual information. In: Veloso MM (ed) Proceedings of the 20th international joint conference on artificial intelligence, IJCAI ’07, pp 823–829

  • Guo Y, Schuurmans D (2007) Discriminative batch mode active learning. In: Platt JC, Koller D, Singer Y, Roweis S (eds), Advances in neural information processing systems 20, NIPS 20, The MIT Press, Cambridge, MA, pp 593–600

  • He J, Liu W, Chang S-F (2010) Scalable similarity search with optimized kernel hashing. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’10, ACM, New York, pp 1129–1138

  • He X, Min W, Cai D, Zhou K (2007) Laplacian optimal design for image retrieval. In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’07, ACM, New York, pp 119–126

  • Hoi SCH, Jin R, Zhu J, Lyu MR (2006a) Batch mode active learning and its application to medical image classification. In: Proceedings of the 23rd international conference on machine learning [45], pp 417-424

  • Hoi SCH, Jin R, Lyu MR (2006b) Large-scale text categorization by batch mode active learning. In: Proceedings of the 15th international conference on world wide web, WWW ’06, ACM, New York, pp 633–642

  • Koren Y (2008) Factorization meets the neighborhood: a multifaceted collaborative filtering model. In: Proceeding of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’08, ACM, New York, pp 426–434

  • Kulis B, Darrell T (2009) Learning to hash with binary reconstructive embeddings. In: Bengio Y, Schuurmans D, Lafferty J, Williams CKI, Culotta A (eds) Advances in neural information processing systems 22, NIPS 22, The MIT Press, Cambridge, MA, pp 1042–1050

  • Kulis B, Grauman K (2009) Kernelized locality-sensitive hashing for scalable image search. In: Proceedings of IEEE 12th international conference on computer vision, ICCV ’09, IEEE Computer Society, Washington, pp 2130–2137

  • Lewis DD, Gale WA (1994) A sequential algorithm for training text classifiers. In Proceedings of the 17th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’94, Springer-Verlag New York, Inc., New York, pp 3–12

  • Lin R-S, Ross DA, Yagnik J (2010) SPEC hashing: similarity preserving algorithm for entropy-based coding. In: Proceedings of IEEE conference on computer vision and pattern recognition [46], pp 848–854

  • MacKay DJC (1992) Information-based objective functions for active data selection. Neural Comput 4(4): 590–604

    Article  Google Scholar 

  • McCallum A, Nigam K (1998) Employing EM and pool-based active learning for text classification. In: Proceedings of the 15th international conference on machine learning, ICML ’98, Morgan Kaufmann Publishers Inc., San Francisco, pp 350–358

  • Mu Y, Shen J, Yan S (2010) Weakly-supervised hashing in kernel space. In: Proceedings of IEEE conference on computer vision and pattern recognition [46], pp 3344–3351

  • Mu Y, Yan S (2010) Non-metric locality-sensitive hashing. In: Fox M, Poolev (eds) Proceedings of the 24th AAAI conference on artificial intelligence, AAAI ’10, AAAI Press, Menlo Park, CA, pp 539–544

  • Nguyen HT, Smeulders A (2004) Active learning using pre-clustering. In:Proceedings of the 21st international conference on machine learning, ICML ’04, ACM, New York, pp 79–86

  • Nicholas R, McCallum A (2001) Toward optimal active learning through sampling estimation of error reduction. In: Proceedings of the 18th international conference on machine learning, ICML ’01, Morgan Kaufmann Publishers Inc., San Francisco, pp 441–448

  • Salakhutdinov R, Hinton GE (2009) Semantic hashing. Int J Approx Reason 50: 969–978

    Article  Google Scholar 

  • Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manag 24(5): 513–523

    Article  Google Scholar 

  • Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv 34(1): 1–47

    Article  Google Scholar 

  • Seung HS, Opper M, Sompolinsky H (1992) Query by committee. In: Proceedings of the 5th annual workshop on computational learning theory, COLT ’92, ACM, New York, pp 287–294

  • Shakhnarovich G (2005) Learning task-specific similarity. PhD thesis, Massachusetts Institute of Technology

  • Shakhnarovich G, Darrell T, Indyk P (2006) Nearest-neighbor methods in learning and vision: theory and practice. The MIT Press, Cambridge, MA

    Google Scholar 

  • Tong S, Koller D (2002) Support vector machine active learning with applications to text classification. J Mach Learn Res 2: 45–66

    MATH  Google Scholar 

  • Torralba A, Fergus R, Weiss Y (2008) Small codes and large image databases for recognition. In: Proceedings of IEEE conference on computer vision and pattern recognition, CVPR ’08, IEEE Computer Society, Los Alamitos, pp 1–8

  • Wang J, Kumar S, Chang S-F (2010a) Semi-supervised hashing for scalable image retrieval. In: Proceedings of IEEE conference on computer vision and pattern recognition [46], pp 3424–3431

  • Wang J, Kumar S, Chang S-F (2010b) Sequential projection learning for hashing with compact codes. In: Proceedings of the 27th international conference on machine learning, ICML ’10, Omnipress, Haifa, pp 1127–1134

  • Weiss Y, Torralba A, Fergus R (2008) Spectral hashing. In: Koller D, Schuurmans D, Bengio Y, Bottou L (eds) Advances in neural information processing systems 21, NIPS 21, The MIT Press, Cambridge, MA, pp 1753–1760

  • Yianilos PN (1993) Data structures and algorithms for nearest neighbor search in general metric spaces. In: Proceedings of the 4th annual ACM-SIAM symposium on discrete algorithms, SODA ’93, Society for Industrial and Applied Mathematics, Philadelphia, pp 311–321

  • Yu K, Bi J, Tresp V (2006) Active learning via transductive experimental design. In: Proceedings of the 23rd international conference on machine learning [47], pp 1081–1088

  • Yu K, Zhu S, Xu W, Gong Y (2008) Non-greedy active learning for text categorization using convex transductive experimental design. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’08, ACM, New York, pp 635–642

  • Zhang D, Wang J, Cai D, Lu J (2010) Self-taught hashing for fast similarity search. In: Proceedings of the 33rd annual international ACM SIGIR conference on research and development in information retrieval [47], pp 18–25

  • Zhen Y, Yeung D-Y (2010) Supervised experimental design and its application to text retrieval. In: Proceedings of the 33rd annual international ACM SIGIR conference on research and development in information retrieval [47], pp 299–306

  • Zhu X, Lafferty J, Ghahramani Z (2003) Combining active learning and semi-supervised learning using gaussian fields and harmonic functions. In: ICML workshop on the continuum from labeled to unlabeled data in machine learning and data mining, ICML ’03

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yi Zhen.

Additional information

Responsible editor: Bing Liu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhen, Y., Yeung, DY. Active hashing and its application to image and text retrieval. Data Min Knowl Disc 26, 255–274 (2013). https://doi.org/10.1007/s10618-012-0249-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-012-0249-y

Keywords

Navigation