Abstract
Quantization-based hashing methods have become increasingly popular to adjust the global data distribution and accurately capture the data similarity compared with pairwise/triplet similarity-based methods. However, the existing image quantization hashing approaches adopt fixed hash centers, which consider neither the semantic information of each hash center nor the scale size of each object appearing in a multi-label image, resulting in that each hash code will deviate from its corresponding hash centroid. To address this issue, we propose HCCST, a hash centroid construction method with Swin transformer for multi-label image retrieval. HCCST consists of a hash code generation module, a hash centroid construction module and an interaction module between each hash code and its corresponding hash centroid. In the hash code generation module, we first adopt Swin transformer to extract the feature vector for each input multi-label image and then generate the initialized hash code of this image. In the hash centroid construction module, we first utilize the object semantic information to construct semantic hash centers and then consider the object scale size by learning the object weight coefficient to compute the hash centroid for each sample. After obtaining both the hash code and hash centroid of each sample, in the last interaction module, we constantly limit the distance between each hash code and its hash centroid to preserve the similarity between samples. Our model will be trained in an end-to-end manner to alternately update the net parameters of hash code generation module, hash centroid construction module and the object weight coefficient. We conduct extensive experiments on 3 multi-label image datasets including VOC2012, MS-COCO and NUS-WIDE. The experimental results demonstrate that HCCST can achieve better retrieval performance compared with the state-of-the-art image hashing methods. The open-source code of this project is released at: https://github.com/lzHZWZ/HCCST.git.
Similar content being viewed by others
Data availability statement
The datasets generated during and/or analyzed during the current study are available in the open-source GitHub repository: https://github.com/lzHZWZ/HCCST.git
References
Çakir F, He K, Bargal SA, Sclaroff S (2019) Hashing with mutual information. IEEE Trans Pattern Anal Mach Intell 41(10):2424–2437
Cao Y, Long M, Liu B, Wang J(2018) Deep cauchy hashing for hamming space retrieval. In: 2018 IEEE conference on computer vision and pattern recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018. p 1229–1237. Computer Vision Foundation / IEEE Computer Society
Cao Y, Long M, Wang J, Zhu H, Wen Q(2016) Deep quantization network for efficient image retrieval. In: Schuurmans, D., Wellman, M.P. (eds.) Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, February 12-17, 2016, Phoenix, Arizona, USA. p 3457–3463. AAAI Press
Cao Z, Long M, Wang J, Yu PS (2017) Hashnet: Deep learning to hash by continuation. In: IEEE international conference on computer vision, ICCV 2017, Venice, Italy, October 22-29, 2017. p 5609–5618. IEEE Computer Society
Chen Y, Lu X (2020) Deep discrete hashing with pairwise correlation learning. Neurocomputing 385:111–121
Chen Z, Wei X, Wang P, Guo Y (2019) Multi-label image recognition with graph convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019. p 5177–5186. Computer Vision Foundation / IEEE
Chua T, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) NUS-WIDE: a real-world web image database from national university of singapore. In: Marchand-Maillet, S., Kompatsiaris, Y. (eds.) Proceedings of the 8th ACM international conference on image and video retrieval, CIVR 2009, Santorini Island, Greece, July 8-10, 2009. ACM
Devlin J, Chang M, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Vol 1 (Long and Short Papers). p 4171–4186. Association for Computational Linguistics
Do T, Doan A, Cheung N (2016) Learning to hash with binary deep neural network. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision - ECCV 2016 - 14th European conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part V. lecture notes in computer science, vol 9909, p 219–234. Springer
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby, N (2021) An image is worth 16x16 words: Transformers for image recognition at scale. In: 9th international conference on learning representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net
Everingham M, Eslami SMA, Gool LV, Williams CKI, Winn JM, Zisserman A (2015) The pascal visual object classes challenge: A retrospective. Int J Comput Vis 111(1):98–136
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on Computer vision and pattern recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016. p 770–778. IEEE Computer Society
He K, Çakir F, Bargal SA, Sclaroff S (2018) Hashing as tie-aware learning to rank. In: 2018 IEEE conference on computer vision and pattern recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018. p 4023–4032. Computer Vision Foundation / IEEE Computer Society
Jégou H, Douze M, Schmid C (2011) Product quantization for nearest neighbor search. IEEE Trans Pattern Anal Mach Intell 33(1):117–128
Joulin A, Grave E, Bojanowski P, Douze M, Jégou H, Mikolov T (2016) Fasttext.zip: Compressing text classification models. CoRR arXiv: abs/1612.03651
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Bartlett, P.L., Pereira, F.C.N., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 25: 26th annual conference on neural information processing systems 2012. Proceedings of a meeting held December 3-6, 2012, Lake Tahoe, Nevada, United States. p 1106–1114
Lai H, Pan Y, Liu Y, Yan S (2015) Simultaneous feature learning and hash coding with deep neural networks. In: IEEE conference on computer vision and pattern recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015. p 3270–3278. IEEE Computer Society
Lecun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436
Li W, Wang S, Kang W (2016) Feature learning based deep supervised hashing with pairwise labels. In: Kambhampati, S. (ed.) Proceedings of the twenty-fifth international joint conference on artificial intelligence, IJCAI 2016, New York, NY, USA, 9-15 July 2016. p 1711–1717. IJCAI/AAAI Press
Liang Y, Pan Y, Lai H, Liu W, Yin J (2022) Deep listwise triplet hashing for fine-grained image retrieval. IEEE Trans Image Process 31:949–961
Lin M, Ji R, Liu H, Sun X, Chen S, Tian Q (2020) Hadamard matrix guided online hashing. Int J Comput Vis 128(8):2279–2306
Lin M, Ji R, Liu H, Wu Y (2018) Supervised online hashing via hadamard codebook learning. In: Boll, S., Lee, K.M., Luo, J., Zhu, W., Byun, H., Chen, C.W., Lienhart, R., Mei, T. (eds.) 2018 ACM multimedia conference on multimedia conference, MM 2018, Seoul, Republic of Korea, October 22-26, 2018. p 1635–1643. ACM
Lin T, Maire M, Belongie SJ, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft COCO: common objects in context. In: Fleet, D.J., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) Computer Vision - ECCV 2014 - 13th european conference, Zurich, Switzerland, September 6-12, 2014, proceedings, Part V. Lecture Notes in Computer Science, vol 8693, p 740–755. Springer
Liu H, Wang R, Shan S, Chen X (2016) Deep supervised hashing for fast image retrieval. In: 2016 IEEE conference on computer vision and pattern recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016. p 2064–2072. IEEE Computer Society
Liu W, Wang J, Ji R, Jiang Y, Chang S (2012) Supervised hashing with kernels. In: 2012 IEEE conference on computer vision and pattern recognition, Providence, RI, USA, June 16-21, 2012. p 2074–2081. IEEE Computer Society
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: 2021 IEEE/CVF international conference on computer vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021. p 9992–10002. IEEE
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. In: Bengio, Y., LeCun, Y. (eds.) 1st international conference on learning representations, ICLR 2013, Scottsdale, Arizona, USA, May 2-4, 2013, Workshop Track Proceedings
Pennington J, Socher R, Manning CD (2014) Glove: Global vectors for word representation. In: Moschitti, A., Pang, B., Daelemans, W. (eds.) Proceedings of the 2014 conference on empirical methods in natural language processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL. p 1532–1543. ACL
Sablayrolles A, Douze M, Usunier N, Jégou H (2017) How should we evaluate supervised hashing? In: 2017 IEEE international conference on acoustics, speech and signal processing, ICASSP 2017, New Orleans, LA, USA, March 5-9, 2017. pp. 1732–1736. IEEE
Tu R, Mao X, Guo J, Wei W, Huang H (2021) Partial-softmax loss based deep hashing. In: Leskovec, J., Grobelnik, M., Najork, M., Tang, J., Zia, L. (eds.) WWW ’21: The Web Conference 2021, Virtual Event / Ljubljana, Slovenia, April 19-23, 2021. pp. 2869–2878. ACM / IW3C2
Wang D, Huang H, Lin H, Mao X (2017) Supervised hashing for multi-labeled data with order-preserving feature. In: Cheng, X., Ma, W., Liu, H., Shen, H., Feng, S., Xie, X. (eds.) Social media processing - 6th national conference, SMP 2017, Beijing, China, September 14-17, 2017, Proceedings. communications in computer and information Science, vol 774, p 16–28. Springer
Wang J, Zhang T, Song J, Sebe N, Shen HT (2018) A survey on learning to hash. IEEE Trans Pattern Anal Mach Intell 40(4):769–790
Wang W, Carreira-Perpiñán MÁ (2013) Projection onto the probability simplex: An efficient algorithm with a simple proof, and an application. CoRR arXiv: abs/1309.1541
Weiss Y, Torralba A, Fergus R (2008) Spectral hashing. In: Koller, D., Schuurmans, D., Bengio, Y., Bottou, L. (eds.) Advances in neural information processing Systems 21, Proceedings of the twenty-second annual conference on neural information processing systems, Vancouver, British Columbia, Canada, December 8-11, 2008. p 1753–1760. Curran Associates, Inc
Xia R, Pan Y, Lai H, Liu C, Yan S (2014) Supervised hashing for image retrieval via image representation learning. In: Brodley, C.E., Stone, P. (eds.) Proceedings of the twenty-eighth AAAI conference on artificial intelligence, July 27 -31, 2014, Québec City, Québec, Canada. p 2156–2162. AAAI Press
Xie Y, Liu Y, Wang Y, Gao L, Wang P, Zhou K (2020) Label-attended hashing for multi-label image retrieval. In: Bessiere, C. (ed.) Proceedings of the twenty-ninth International joint conference on artificial intelligence, IJCAI 2020. p 955–962. ijcai.org
Yuan L, Wang T, Zhang X, Tay FEH, Jie Z, Liu W, Feng J (2020) Central similarity quantization for efficient image and video retrieval. In: 2020 IEEE/CVF conference on computer vision and pattern recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020. p 3080–3089. Computer vision foundation / IEEE
Zhang Z, Zou Q, Lin Y, Chen L, Wang S (2020) Improved deep hashing with soft pairwise similarity for multi-label image retrieval. IEEE Trans Multim 22(2):540–553
Zhu H, Long M, Wang J, Cao Y (2016) Deep hashing network for efficient similarity retrieval. In: Schuurmans, D., Wellman, M.P. (eds.) Proceedings of the thirtieth AAAI conference on artificial intelligence, February 12-17, 2016, Phoenix, Arizona, USA. p 2415–2421. AAAI Press
Acknowledgements
Authors thank for the support of the National Natural Science Foundation of China Grant (No. 62232007), the Research on the supporting technologies of the metaverse in cultural Media (PT252022039), Guangzhou Science and Technology Planning Project (No.202201010529), the National Natural Science Foundation of China (No.61902135), the National Natural Science Foundation of China (No.61871139) and the International Science and Technology Cooperation Projects of Guangdong Province (No.2020A0505100060).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interests
The authors have no competing interests to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Xie, Y., Wang, Y., Wei, R. et al. A hash centroid construction method with Swin transformer for multi-label image retrieval. Neural Comput & Applic 35, 10891–10907 (2023). https://doi.org/10.1007/s00521-023-08273-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-023-08273-x