Abstract
Zero-shot object detection (ZSD) is dedicated to the task of precisely localizing and identifying unfamiliar objects that have not been encountered before. In this paper, a contrastive semantic association network is proposed to address the knowledge transfer challenge from seen classes to unseen ones in ZSD. It enables efficient information propagation through similarity-based connections, thereby establishing a clearer link between seen and unseen categories. Moreover, a visual-semantic contrastive learning technique is developed to mitigate the node convergence issue caused by the graph structure of the proposed network. By emphasizing the visual and semantic distinctiveness across different categories, the proposed model leverages semantic information and graph structure knowledge to enhance the generalization capability of seen and unseen feature projection. Extensive experiments demonstrate the superior performance of our model compared to other zero-shot object detection methods, showcasing notable improvement in mean average precision (mAP) on the MS-COCO dataset. The code and models are publicly available at: https://github.com/lihh1023/CSA-ZSD/tree/master.
Similar content being viewed by others
References
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580–587
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp. 1440–1448
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 28:1–2
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916
Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp. 2980–2988
Wang H, Peng J, Jiang G, Xu F, Fu X (2021) Discriminative feature and dictionary learning with part-aware model for vehicle re-identification. Neurocomputing 438:55–62
Wang H, Peng J, Chen D, Jiang G, Zhao T, Fu X (2020) Attribute-guided feature learning network for vehicle reidentification. IEEE MultiMedia 27(4):112–121
Wang H, Peng J, Zhao Y, Fu X (2020) Multi-path deep cnns for fine-grained car recognition. IEEE Trans Veh Technol 69(10):10484–10493
Zhu P, Wang H, Saligrama V (2019) Zero shot detection. IEEE Trans Circuits Syst Video Technol 30(4):998–1010
Li Z, Yao L, Zhang X, Wang X, Kanhere S, Zhang H (2019) Zero-shot object detection with textual descriptions. In: Proceedings of the AAAI conference on artificial intelligence, vol. 33, pp. 8690–8697
Bansal A, Sikka K, Sharma G, Chellappa R, Divakaran A (2018) Zero-shot object detection. In: Proceedings of the European conference on computer vision (ECCV), pp. 384–400
Rahman S, Khan S, Porikli F (2018) Zero-shot object detection: Learning to simultaneously recognize and localize novel concepts. In: Asian conference on computer vision, pp. 547–563. Springer
Rahman S, Khan S, Barnes N (2020) Improved visual-semantic alignment for zero-shot object detection. In: Proceedings of the AAAI conference on artificial intelligence, vol. 34, pp. 11932–11939
Yan C, Chang X, Luo M, Liu H, Zhang X, Zheng Q (2022) Semantics-guided contrastive network for zero-shot object detection. IEEE transactions on pattern analysis and machine intelligence
Li Q, Zhang Y, Sun S, Zhao X, Li K, Tan M (2021) Rethinking semantic-visual alignment in zero-shot object detection via a softplus margin focal loss. Neurocomputing 449:117–135
Rahman S, Khan S, Barnes N (2019) Transductive learning for zero-shot object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 6082–6091
Hayat N, Hayat M, Rahman S, Khan S, Zamir SW, Khan FS (2020) Synthesizing the unseen for zero-shot object detection. In: Proceedings of the Asian conference on computer vision
Zhao S, Gao C, Shao Y, Li L, Yu C, Ji Z, Sang N (2020) Gtnet: Generative transfer network for zero-shot object detection. In: Proceedings of the AAAI conference on artificial intelligence, vol. 34, pp. 12967–12974
Zhu, P., Wang, H., Saligrama, V.: Don’t even look once: Synthesizing features for zero-shot detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11693–11702 (2020)
Zhang L, Wang X, Yao L, Wu L, Zheng F (2020) Zero-shot object detection via learning an embedding from semantic space to visual space. In: Twenty-Ninth international joint conference on artificial intelligence and seventeenth pacific rim international conference on artificial intelligence IJCAI-PRICAI-20. International joint conferences on artificial intelligence organization
Mao Q, Wang C, Yu S, Zheng Y, Li Y (2020) Zero-shot object detection with attributes-based category similarity. IEEE Trans Circuits Syst II Express Briefs 67(5):921–925
Nie H, Wang R, Chen X (2022) From node to graph: Joint reasoning on visual-semantic relational graph for zero-shot detection. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp. 1109–1118
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788
Zheng Y, Huang R, Han C, Huang X, Cui L (2020) Background learnable cascade for zero-shot object detection. In: Proceedings of the Asian conference on computer vision
Yan C, Zheng Q, Chang X, Luo M, Yeh C-H, Hauptman AG (2020) Semantics-preserving graph propagation for zero-shot object detection. IEEE Trans Image Process 29:8163–8176
Wang H, Jiang G, Peng J, Deng R, Fu X (2022) Towards adaptive consensus graph: Multi-view clustering via graph collaboration. IEEE Trans Multimed
Scarselli F, Gori M, Tsoi AC, Hagenbuchner M, Monfardini G (2008) The graph neural network model. IEEE transactions on neural networks 20(1):61–80
Welling M, Kipf TN (2016) Semi-supervised classification with graph convolutional networks. In: J. international conference on learning representations (ICLR 2017)
Marino K, Salakhutdinov R, Gupta A (2016) The more you know: using knowledge graphs for image classification. arXiv:1612.04844
Han G, He Y, Huang S, Ma J, Chang S-F (2021) Query adaptive few-shot object detection with heterogeneous graph convolutional networks. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 3263–3272
Chen B, Zhang J, Zhang X, Dong Y, Song J, Zhang P, Xu K, Kharlamov E, Tang J (2022) Gccad: Graph contrastive learning for anomaly detection. IEEE Trans Knowl Data Eng 01:1–14
Wang X, Ye Y, Gupta A (2018) Zero-shot recognition via semantic embeddings and knowledge graphs. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6857–6866
Kampffmeyer M, Chen Y, Liang X, Wang H, Zhang Y, Xing EP (2019) Rethinking knowledge graph propagation for zero-shot learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11487–11496
Wei J, Sun H, Yang Y, Xu X, Li J, Shen HT (2022) Semantic guided knowledge graph for large-scale zero-shot learning. J Vis Commun Image Represent 88:103629
Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06), vol. 2, pp. 1735– 1742 (2006). IEEE
Liu X, Zhang F, Hou Z, Mian L, Wang Z, Zhang J, Tang J (2021) Self-supervised learning: Generative or contrastive. IEEE Trans Knowl Data Eng
Wu Z, Xiong Y, Yu SX, Lin D (2018) Unsupervised feature learning via non-parametric instance discrimination. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3733–3742
He K, Fan H, Wu Y, Xie S, Girshick R (2020) Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9729–9738
Chen T, Kornblith S, Norouzi M, Hinton G (2020) A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607. PMLR
Wang X, Qi G-J (2022) Contrastive learning with stronger augmentations. IEEE Trans Pattern Anal Mach Intell
Sun B, Li B, Cai S, Yuan, Y, Zhang C (2021) Fsce: Few-shot object detection via contrastive proposal encoding. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7352–7362
Huang L, Dai S, He Z (2022) Few-shot object detection with semantic enhancement and semantic prototype contrastive learning. Knowl-Based Syst 252:109411
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2117– 2125
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European conference on computer vision, pp. 740–755. Springer
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. Adv Neural Inf Process Syst 26
Yang C, Wu W, Wang Y, Zhou H (2022) A novel feature-based model for zero-shot object detection with simulated attributes. Appl Intell 52(6):6905–6914
Huang P, Han J, Cheng D, Zhang D (2022) Robust region feature synthesizer for zero-shot object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7622–7631
Sarma S, Kumar S, Sur A (2022) Resolving semantic confusions for improved zero-shot detection. In: 33rd British machine vision conference(BMVC)
Acknowledgements
This work was supported by the Ningbo Municipal Natural Science Foundation of China (No. 2022J114) and Innovation Challenge Project of China (Ningbo) (No. 2022T001).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, H., Wang, C., Liu, W. et al. Zero-shot object detection with contrastive semantic association network. Appl Intell 53, 30056–30068 (2023). https://doi.org/10.1007/s10489-023-05117-y
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-05117-y