Skip to main content
Log in

A hash centroid construction method with Swin transformer for multi-label image retrieval

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Quantization-based hashing methods have become increasingly popular to adjust the global data distribution and accurately capture the data similarity compared with pairwise/triplet similarity-based methods. However, the existing image quantization hashing approaches adopt fixed hash centers, which consider neither the semantic information of each hash center nor the scale size of each object appearing in a multi-label image, resulting in that each hash code will deviate from its corresponding hash centroid. To address this issue, we propose HCCST, a hash centroid construction method with Swin transformer for multi-label image retrieval. HCCST consists of a hash code generation module, a hash centroid construction module and an interaction module between each hash code and its corresponding hash centroid. In the hash code generation module, we first adopt Swin transformer to extract the feature vector for each input multi-label image and then generate the initialized hash code of this image. In the hash centroid construction module, we first utilize the object semantic information to construct semantic hash centers and then consider the object scale size by learning the object weight coefficient to compute the hash centroid for each sample. After obtaining both the hash code and hash centroid of each sample, in the last interaction module, we constantly limit the distance between each hash code and its hash centroid to preserve the similarity between samples. Our model will be trained in an end-to-end manner to alternately update the net parameters of hash code generation module, hash centroid construction module and the object weight coefficient. We conduct extensive experiments on 3 multi-label image datasets including VOC2012, MS-COCO and NUS-WIDE. The experimental results demonstrate that HCCST can achieve better retrieval performance compared with the state-of-the-art image hashing methods. The open-source code of this project is released at: https://github.com/lzHZWZ/HCCST.git.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data availability statement

The datasets generated during and/or analyzed during the current study are available in the open-source GitHub repository: https://github.com/lzHZWZ/HCCST.git

References

  1. Çakir F, He K, Bargal SA, Sclaroff S (2019) Hashing with mutual information. IEEE Trans Pattern Anal Mach Intell 41(10):2424–2437

    Article  Google Scholar 

  2. Cao Y, Long M, Liu B, Wang J(2018) Deep cauchy hashing for hamming space retrieval. In: 2018 IEEE conference on computer vision and pattern recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018. p 1229–1237. Computer Vision Foundation / IEEE Computer Society

  3. Cao Y, Long M, Wang J, Zhu H, Wen Q(2016) Deep quantization network for efficient image retrieval. In: Schuurmans, D., Wellman, M.P. (eds.) Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, February 12-17, 2016, Phoenix, Arizona, USA. p 3457–3463. AAAI Press

  4. Cao Z, Long M, Wang J, Yu PS (2017) Hashnet: Deep learning to hash by continuation. In: IEEE international conference on computer vision, ICCV 2017, Venice, Italy, October 22-29, 2017. p 5609–5618. IEEE Computer Society

  5. Chen Y, Lu X (2020) Deep discrete hashing with pairwise correlation learning. Neurocomputing 385:111–121

    Article  Google Scholar 

  6. Chen Z, Wei X, Wang P, Guo Y (2019) Multi-label image recognition with graph convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019. p 5177–5186. Computer Vision Foundation / IEEE

  7. Chua T, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) NUS-WIDE: a real-world web image database from national university of singapore. In: Marchand-Maillet, S., Kompatsiaris, Y. (eds.) Proceedings of the 8th ACM international conference on image and video retrieval, CIVR 2009, Santorini Island, Greece, July 8-10, 2009. ACM

  8. Devlin J, Chang M, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Vol 1 (Long and Short Papers). p 4171–4186. Association for Computational Linguistics

  9. Do T, Doan A, Cheung N (2016) Learning to hash with binary deep neural network. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision - ECCV 2016 - 14th European conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part V. lecture notes in computer science, vol 9909, p 219–234. Springer

  10. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby, N (2021) An image is worth 16x16 words: Transformers for image recognition at scale. In: 9th international conference on learning representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net

  11. Everingham M, Eslami SMA, Gool LV, Williams CKI, Winn JM, Zisserman A (2015) The pascal visual object classes challenge: A retrospective. Int J Comput Vis 111(1):98–136

    Article  Google Scholar 

  12. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on Computer vision and pattern recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016. p 770–778. IEEE Computer Society

  13. He K, Çakir F, Bargal SA, Sclaroff S (2018) Hashing as tie-aware learning to rank. In: 2018 IEEE conference on computer vision and pattern recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018. p 4023–4032. Computer Vision Foundation / IEEE Computer Society

  14. Jégou H, Douze M, Schmid C (2011) Product quantization for nearest neighbor search. IEEE Trans Pattern Anal Mach Intell 33(1):117–128

    Article  Google Scholar 

  15. Joulin A, Grave E, Bojanowski P, Douze M, Jégou H, Mikolov T (2016) Fasttext.zip: Compressing text classification models. CoRR arXiv: abs/1612.03651

  16. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Bartlett, P.L., Pereira, F.C.N., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 25: 26th annual conference on neural information processing systems 2012. Proceedings of a meeting held December 3-6, 2012, Lake Tahoe, Nevada, United States. p 1106–1114

  17. Lai H, Pan Y, Liu Y, Yan S (2015) Simultaneous feature learning and hash coding with deep neural networks. In: IEEE conference on computer vision and pattern recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015. p 3270–3278. IEEE Computer Society

  18. Lecun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436

    Article  Google Scholar 

  19. Li W, Wang S, Kang W (2016) Feature learning based deep supervised hashing with pairwise labels. In: Kambhampati, S. (ed.) Proceedings of the twenty-fifth international joint conference on artificial intelligence, IJCAI 2016, New York, NY, USA, 9-15 July 2016. p 1711–1717. IJCAI/AAAI Press

  20. Liang Y, Pan Y, Lai H, Liu W, Yin J (2022) Deep listwise triplet hashing for fine-grained image retrieval. IEEE Trans Image Process 31:949–961

    Article  Google Scholar 

  21. Lin M, Ji R, Liu H, Sun X, Chen S, Tian Q (2020) Hadamard matrix guided online hashing. Int J Comput Vis 128(8):2279–2306

    Article  MATH  MathSciNet  Google Scholar 

  22. Lin M, Ji R, Liu H, Wu Y (2018) Supervised online hashing via hadamard codebook learning. In: Boll, S., Lee, K.M., Luo, J., Zhu, W., Byun, H., Chen, C.W., Lienhart, R., Mei, T. (eds.) 2018 ACM multimedia conference on multimedia conference, MM 2018, Seoul, Republic of Korea, October 22-26, 2018. p 1635–1643. ACM

  23. Lin T, Maire M, Belongie SJ, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft COCO: common objects in context. In: Fleet, D.J., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) Computer Vision - ECCV 2014 - 13th european conference, Zurich, Switzerland, September 6-12, 2014, proceedings, Part V. Lecture Notes in Computer Science, vol 8693, p 740–755. Springer

  24. Liu H, Wang R, Shan S, Chen X (2016) Deep supervised hashing for fast image retrieval. In: 2016 IEEE conference on computer vision and pattern recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016. p 2064–2072. IEEE Computer Society

  25. Liu W, Wang J, Ji R, Jiang Y, Chang S (2012) Supervised hashing with kernels. In: 2012 IEEE conference on computer vision and pattern recognition, Providence, RI, USA, June 16-21, 2012. p 2074–2081. IEEE Computer Society

  26. Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: 2021 IEEE/CVF international conference on computer vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021. p 9992–10002. IEEE

  27. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. In: Bengio, Y., LeCun, Y. (eds.) 1st international conference on learning representations, ICLR 2013, Scottsdale, Arizona, USA, May 2-4, 2013, Workshop Track Proceedings

  28. Pennington J, Socher R, Manning CD (2014) Glove: Global vectors for word representation. In: Moschitti, A., Pang, B., Daelemans, W. (eds.) Proceedings of the 2014 conference on empirical methods in natural language processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL. p 1532–1543. ACL

  29. Sablayrolles A, Douze M, Usunier N, Jégou H (2017) How should we evaluate supervised hashing? In: 2017 IEEE international conference on acoustics, speech and signal processing, ICASSP 2017, New Orleans, LA, USA, March 5-9, 2017. pp. 1732–1736. IEEE

  30. Tu R, Mao X, Guo J, Wei W, Huang H (2021) Partial-softmax loss based deep hashing. In: Leskovec, J., Grobelnik, M., Najork, M., Tang, J., Zia, L. (eds.) WWW ’21: The Web Conference 2021, Virtual Event / Ljubljana, Slovenia, April 19-23, 2021. pp. 2869–2878. ACM / IW3C2

  31. Wang D, Huang H, Lin H, Mao X (2017) Supervised hashing for multi-labeled data with order-preserving feature. In: Cheng, X., Ma, W., Liu, H., Shen, H., Feng, S., Xie, X. (eds.) Social media processing - 6th national conference, SMP 2017, Beijing, China, September 14-17, 2017, Proceedings. communications in computer and information Science, vol 774, p 16–28. Springer

  32. Wang J, Zhang T, Song J, Sebe N, Shen HT (2018) A survey on learning to hash. IEEE Trans Pattern Anal Mach Intell 40(4):769–790

    Article  Google Scholar 

  33. Wang W, Carreira-Perpiñán MÁ (2013) Projection onto the probability simplex: An efficient algorithm with a simple proof, and an application. CoRR arXiv: abs/1309.1541

  34. Weiss Y, Torralba A, Fergus R (2008) Spectral hashing. In: Koller, D., Schuurmans, D., Bengio, Y., Bottou, L. (eds.) Advances in neural information processing Systems 21, Proceedings of the twenty-second annual conference on neural information processing systems, Vancouver, British Columbia, Canada, December 8-11, 2008. p 1753–1760. Curran Associates, Inc

  35. Xia R, Pan Y, Lai H, Liu C, Yan S (2014) Supervised hashing for image retrieval via image representation learning. In: Brodley, C.E., Stone, P. (eds.) Proceedings of the twenty-eighth AAAI conference on artificial intelligence, July 27 -31, 2014, Québec City, Québec, Canada. p 2156–2162. AAAI Press

  36. Xie Y, Liu Y, Wang Y, Gao L, Wang P, Zhou K (2020) Label-attended hashing for multi-label image retrieval. In: Bessiere, C. (ed.) Proceedings of the twenty-ninth International joint conference on artificial intelligence, IJCAI 2020. p 955–962. ijcai.org

  37. Yuan L, Wang T, Zhang X, Tay FEH, Jie Z, Liu W, Feng J (2020) Central similarity quantization for efficient image and video retrieval. In: 2020 IEEE/CVF conference on computer vision and pattern recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020. p 3080–3089. Computer vision foundation / IEEE

  38. Zhang Z, Zou Q, Lin Y, Chen L, Wang S (2020) Improved deep hashing with soft pairwise similarity for multi-label image retrieval. IEEE Trans Multim 22(2):540–553

    Article  Google Scholar 

  39. Zhu H, Long M, Wang J, Cao Y (2016) Deep hashing network for efficient similarity retrieval. In: Schuurmans, D., Wellman, M.P. (eds.) Proceedings of the thirtieth AAAI conference on artificial intelligence, February 12-17, 2016, Phoenix, Arizona, USA. p 2415–2421. AAAI Press

Download references

Acknowledgements

Authors thank for the support of the National Natural Science Foundation of China Grant (No. 62232007), the Research on the supporting technologies of the metaverse in cultural Media (PT252022039), Guangzhou Science and Technology Planning Project (No.202201010529), the National Natural Science Foundation of China (No.61902135), the National Natural Science Foundation of China (No.61871139) and the International Science and Technology Cooperation Projects of Guangdong Province (No.2020A0505100060).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yangtao Wang.

Ethics declarations

Conflict of interests

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xie, Y., Wang, Y., Wei, R. et al. A hash centroid construction method with Swin transformer for multi-label image retrieval. Neural Comput & Applic 35, 10891–10907 (2023). https://doi.org/10.1007/s00521-023-08273-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-023-08273-x

Keywords

Navigation