Skip to main content
Log in

Hierarchical deep semantic hashing for fast image retrieval

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Content-Based large-scale image retrieval has recently attracted considerable attention because of the explosive increase of online images. Inspired by recent advances in convolutional neural networks, we propose a hierarchical deep semantic method for learning similarity function that solves the problems of precision and speed of retrieval in the setting of large-scale environments. The distinctive contribution of our work is a novel approach that can utilize previous knowledge of the semantic hierarchy. When semantic information and a related hierarchy structure are available, significant improvements can be attained. Exploiting hierarchical relationships is the most important thing for large-scale issues. Binary code can be learned from deep neural network for representing the latent concepts that dominate the semantic labels. Different from other supervised methods that require learning an explicit hashing function to map the binary code features from the images, our method learns Hierarchical Deep Semantic Hashing code (HDSH-code) and image representations in an implicit manner, making it suitable for large-scale datasets. An additional contribution is a novel hashing scheme (generated at the same time with semantic information) that is able to reduce the computational cost of retrieval. Comprehensive experiments were conducted on Holidays, Oxford5k/105k, Caltech256 retrieval datasets, our HDSH performs competitively even when the convolutional neural network has been pre-trained for a surrogate unrelated task. We further demonstrates its efficacy and scalability on a large-scale dataset Imagenet with millions of images. With deep hierarchical semantic hashing, we report retrieval times are 0.15ms and 53.92ms on H o l i d a y s and I m a g e n e t dataset, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Arandjelovic R, Zisserman A (2013) All about vlad. In: IEEE conference on computer vision and pattern recognition (CVPR), 2013. IEEE, pp 1578–1585

  2. Babenko A, Slesarev A, Chigorin A, Lempitsky V (2014) Neural codes for image retrieval. In: Computer vision–ECCV 2014. Springer, pp 584–599

  3. Chechik G, Sharma V, Shalit U, Bengio S (2010) Large scale online learning of image similarity through ranking. J Mach Learn Res 11:1109–1135

    MathSciNet  MATH  Google Scholar 

  4. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE computer society conference on computer vision and pattern recognition, 2005. CVPR 2005, vol 1. IEEE, pp 886–893

  5. Delhumeau J, Gosselin PH, Jégou H, Pérez P (2013) Revisiting the vlad image representation. In: Proceedings of the 21st ACM international conference on multimedia. ACM, pp 653–656

  6. Gionis A, Indyk P, Motwani R et al (1999) Similarity search in high dimensions via hashing. In: VLDB, vol 99, pp 518–529

  7. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE conference on computer vision and pattern recognition (CVPR), 2014. IEEE, pp 580–587

  8. Gong Y, Wang L, Guo R, Lazebnik S (2014) Multi-scale orderless pooling of deep convolutional activation features. In: Computer vision–ECCV 2014. Springer, pp 392–407

  9. Griffin G, Holub A, Perona P (2007) Caltech-256 object category dataset. California Institute of Technology

  10. Jégou H, Zisserman A (2014) Triangulation embedding and democratic aggregation for image search. In: IEEE conference on computer vision and pattern recognition (CVPR), 2014. IEEE, pp 3310–3317

  11. Jegou H, Douze M, Schmid C (2008) Hamming embedding and weak geometric consistency for large scale image search. In: Computer vision–ECCV 2008. Springer, pp 304–317

  12. Jégou H, Perronnin F, Douze M, Sanchez J, Perez P, Schmid C (2012) Aggregating local image descriptors into compact codes. IEEE Trans Pattern Anal Mach Intell 34(9):1704–1716

    Article  Google Scholar 

  13. Jia D, Berg AC, Li FF (2011) Hierarchical semantic indexing for large scale image retrieval. Cvpr 32(14):785–792

    Google Scholar 

  14. Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the ACM international conference on multimedia. ACM, pp 675–678

  15. Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: IEEE conference on computer vision and pattern recognition (CVPR), 2014. IEEE, pp 1725–1732

  16. Krizhevsky A, Hinton G (2011) Using very deep autoencoders for content-based image retrieval. In: ESANN. Citeseer

  17. Krizhevsky A, Sutskever I, Hinton G (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

  18. Kulis B, Grauman K (2009) Kernelized locality-sensitive hashing for scalable image search. In: IEEE 12th international conference on computer vision, 2009. IEEE, pp 2130–2137

  19. Li X, Larson M, Hanjalic A (2015) Pairwise geometric matching for large-scale object retrieval. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR)

  20. Lin K, Yang HF, Hsiao JH, Chen CS (2015) Deep learning of binary hash codes for fast image retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 27–35

  21. Liu W, Wang J, Ji R, Jiang YG, Chang SF (2012) Supervised hashing with kernels. In: IEEE conference on computer vision and pattern recognition (CVPR), 2012. IEEE, pp 2074–2081

  22. Long J, Shelhamer E, Darrell T (2014) Fully convolutional networks for semantic segmentation. arXiv preprint arXiv:1411.4038

  23. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110

    Article  Google Scholar 

  24. Ng JY, Yang F, Davis LS (2015) Exploiting local features from deep networks for image retrieval. In: 2015 IEEE conference on computer vision and pattern recognition workshops, pp 53–61

  25. Norouzi M, Blei DM (2011) Minimal loss hashing for compact binary codes. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 353–360

  26. Ntalianis K, Tsapatsoulis N, Doulamis A, Matsatsinis N (2014) Automatic annotation of image databases based on implicit crowdsourcing, visual concept modeling and evolution. Multimedia Tools Appl 69(2):397–421

    Article  Google Scholar 

  27. Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175

    Article  MATH  Google Scholar 

  28. Perronnin F, Liu Y, Sánchez J, Poirier H (2010) Large-scale image retrieval with compressed fisher vectors. In: IEEE conference on computer vision and pattern recognition (CVPR), 2010. IEEE, pp 3384–3391

  29. Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: 2007 IEEE conference on computer vision and pattern recognition, pp 1–8

  30. Razavian AS, Azizpour H, Sullivan J, Carlsson S (2014) Cnn features off-the-shelf: an astounding baseline for recognition. In: IEEE conference on computer vision and pattern recognition workshops (CVPRW), 2014. IEEE, pp 512–519

  31. Razavian AS, Sullivan J, Maki A, Carlsson S (2014) Visual instance retrieval with deep convolutional networks. arXiv preprint arXiv:1412.6574

  32. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Computer Science

  33. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2014) Imagenet large scale visual recognition challenge. Int J Comput Vis 1–42

  34. Salakhutdinov R, Hinton G (2009) Semantic hashing. Int J Approx Reason 50(7):969–978

    Article  Google Scholar 

  35. Toshev A, Szegedy C (2014) Deeppose: human pose estimation via deep neural networks. In: IEEE conference on computer vision and pattern recognition (CVPR), 2014. IEEE, pp 1653–1660

  36. Wang N, Yeung DY (2013) Learning a deep compact image representation for visual tracking. In: Advances in neural information processing systems, pp 809–817

  37. Wang J, Kumar S, Chang SF (2012) Semi-supervised hashing for large-scale search. IEEE Trans Pattern Anal Mach Intell 34(12):2393–2406

    Article  Google Scholar 

  38. Weiss Y, Torralba A, Fergus R (2009) Spectral hashing. In: Advances in neural information processing systems, pp 1753–1760

  39. Xia R, Pan Y, Lai H, Liu C, Yan S (2014) Supervised hashing for image retrieval via image representation learning. In: Proceedings of the AAAI conference on artificial intelligence, pp 2156–2162

  40. Zhao WL, Jégou H, Gravier G (2013) Oriented pooling for dense and non-dense rotation-invariant features. In: BMVC-24th British machine vision conference

  41. Zhao F, Huang Y, Wang L, Tan T (2015) Deep semantic ranking based hashing for multi-label image retrieval. In: IEEE conference on computer vision and pattern recognition, CVPR 2015, pp 1556–1564

Download references

Acknowledgments

This work was supported in part by the Natural Science Foundation of China under Grant U1536203 and 61272409, in part by the Major Scientific and Technological Innovation Project of Hubei Province under Grant 2015AAA013, the Nature Science Foundation of the Open University of China under Grant G16F3702Z and G16F2505Q. The authors appreciate the valuable suggestions from the anonymous reviewers and the Editors.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hefei Ling.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ou, X., Ling, H., Liu, S. et al. Hierarchical deep semantic hashing for fast image retrieval. Multimed Tools Appl 76, 21281–21302 (2017). https://doi.org/10.1007/s11042-016-4057-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-016-4057-z

Keywords

Navigation