Zero-shot object detection with contrastive semantic association network

Li, Haohe; Wang, Chong; Liu, Weijie; Gong, Yilin; Dai, Xinmiao

doi:10.1007/s10489-023-05117-y

Zero-shot object detection with contrastive semantic association network

Published: 09 November 2023

Volume 53, pages 30056–30068, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Haohe Li¹,
Chong Wang ORCID: orcid.org/0000-0001-6016-6545¹,
Weijie Liu^1,2,
Yilin Gong¹ &
…
Xinmiao Dai¹

189 Accesses
Explore all metrics

Abstract

Zero-shot object detection (ZSD) is dedicated to the task of precisely localizing and identifying unfamiliar objects that have not been encountered before. In this paper, a contrastive semantic association network is proposed to address the knowledge transfer challenge from seen classes to unseen ones in ZSD. It enables efficient information propagation through similarity-based connections, thereby establishing a clearer link between seen and unseen categories. Moreover, a visual-semantic contrastive learning technique is developed to mitigate the node convergence issue caused by the graph structure of the proposed network. By emphasizing the visual and semantic distinctiveness across different categories, the proposed model leverages semantic information and graph structure knowledge to enhance the generalization capability of seen and unseen feature projection. Extensive experiments demonstrate the superior performance of our model compared to other zero-shot object detection methods, showcasing notable improvement in mean average precision (mAP) on the MS-COCO dataset. The code and models are publicly available at: https://github.com/lihh1023/CSA-ZSD/tree/master.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

SSD: Single Shot MultiBox Detector

End-to-End Object Detection with Transformers

References

Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580–587
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp. 1440–1448
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 28:1–2
Google Scholar
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916
Article Google Scholar
Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp. 2980–2988
Wang H, Peng J, Jiang G, Xu F, Fu X (2021) Discriminative feature and dictionary learning with part-aware model for vehicle re-identification. Neurocomputing 438:55–62
Article Google Scholar
Wang H, Peng J, Chen D, Jiang G, Zhao T, Fu X (2020) Attribute-guided feature learning network for vehicle reidentification. IEEE MultiMedia 27(4):112–121
Article Google Scholar
Wang H, Peng J, Zhao Y, Fu X (2020) Multi-path deep cnns for fine-grained car recognition. IEEE Trans Veh Technol 69(10):10484–10493
Article Google Scholar
Zhu P, Wang H, Saligrama V (2019) Zero shot detection. IEEE Trans Circuits Syst Video Technol 30(4):998–1010
Article Google Scholar
Li Z, Yao L, Zhang X, Wang X, Kanhere S, Zhang H (2019) Zero-shot object detection with textual descriptions. In: Proceedings of the AAAI conference on artificial intelligence, vol. 33, pp. 8690–8697
Bansal A, Sikka K, Sharma G, Chellappa R, Divakaran A (2018) Zero-shot object detection. In: Proceedings of the European conference on computer vision (ECCV), pp. 384–400
Rahman S, Khan S, Porikli F (2018) Zero-shot object detection: Learning to simultaneously recognize and localize novel concepts. In: Asian conference on computer vision, pp. 547–563. Springer
Rahman S, Khan S, Barnes N (2020) Improved visual-semantic alignment for zero-shot object detection. In: Proceedings of the AAAI conference on artificial intelligence, vol. 34, pp. 11932–11939
Yan C, Chang X, Luo M, Liu H, Zhang X, Zheng Q (2022) Semantics-guided contrastive network for zero-shot object detection. IEEE transactions on pattern analysis and machine intelligence
Li Q, Zhang Y, Sun S, Zhao X, Li K, Tan M (2021) Rethinking semantic-visual alignment in zero-shot object detection via a softplus margin focal loss. Neurocomputing 449:117–135
Article Google Scholar
Rahman S, Khan S, Barnes N (2019) Transductive learning for zero-shot object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 6082–6091
Hayat N, Hayat M, Rahman S, Khan S, Zamir SW, Khan FS (2020) Synthesizing the unseen for zero-shot object detection. In: Proceedings of the Asian conference on computer vision
Zhao S, Gao C, Shao Y, Li L, Yu C, Ji Z, Sang N (2020) Gtnet: Generative transfer network for zero-shot object detection. In: Proceedings of the AAAI conference on artificial intelligence, vol. 34, pp. 12967–12974
Zhu, P., Wang, H., Saligrama, V.: Don’t even look once: Synthesizing features for zero-shot detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11693–11702 (2020)
Zhang L, Wang X, Yao L, Wu L, Zheng F (2020) Zero-shot object detection via learning an embedding from semantic space to visual space. In: Twenty-Ninth international joint conference on artificial intelligence and seventeenth pacific rim international conference on artificial intelligence IJCAI-PRICAI-20. International joint conferences on artificial intelligence organization
Mao Q, Wang C, Yu S, Zheng Y, Li Y (2020) Zero-shot object detection with attributes-based category similarity. IEEE Trans Circuits Syst II Express Briefs 67(5):921–925
Google Scholar
Nie H, Wang R, Chen X (2022) From node to graph: Joint reasoning on visual-semantic relational graph for zero-shot detection. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp. 1109–1118
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788
Zheng Y, Huang R, Han C, Huang X, Cui L (2020) Background learnable cascade for zero-shot object detection. In: Proceedings of the Asian conference on computer vision
Yan C, Zheng Q, Chang X, Luo M, Yeh C-H, Hauptman AG (2020) Semantics-preserving graph propagation for zero-shot object detection. IEEE Trans Image Process 29:8163–8176
Article Google Scholar
Wang H, Jiang G, Peng J, Deng R, Fu X (2022) Towards adaptive consensus graph: Multi-view clustering via graph collaboration. IEEE Trans Multimed
Scarselli F, Gori M, Tsoi AC, Hagenbuchner M, Monfardini G (2008) The graph neural network model. IEEE transactions on neural networks 20(1):61–80
Article Google Scholar
Welling M, Kipf TN (2016) Semi-supervised classification with graph convolutional networks. In: J. international conference on learning representations (ICLR 2017)
Marino K, Salakhutdinov R, Gupta A (2016) The more you know: using knowledge graphs for image classification. arXiv:1612.04844
Han G, He Y, Huang S, Ma J, Chang S-F (2021) Query adaptive few-shot object detection with heterogeneous graph convolutional networks. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 3263–3272
Chen B, Zhang J, Zhang X, Dong Y, Song J, Zhang P, Xu K, Kharlamov E, Tang J (2022) Gccad: Graph contrastive learning for anomaly detection. IEEE Trans Knowl Data Eng 01:1–14
Google Scholar
Wang X, Ye Y, Gupta A (2018) Zero-shot recognition via semantic embeddings and knowledge graphs. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6857–6866
Kampffmeyer M, Chen Y, Liang X, Wang H, Zhang Y, Xing EP (2019) Rethinking knowledge graph propagation for zero-shot learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11487–11496
Wei J, Sun H, Yang Y, Xu X, Li J, Shen HT (2022) Semantic guided knowledge graph for large-scale zero-shot learning. J Vis Commun Image Represent 88:103629
Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06), vol. 2, pp. 1735– 1742 (2006). IEEE
Liu X, Zhang F, Hou Z, Mian L, Wang Z, Zhang J, Tang J (2021) Self-supervised learning: Generative or contrastive. IEEE Trans Knowl Data Eng
Wu Z, Xiong Y, Yu SX, Lin D (2018) Unsupervised feature learning via non-parametric instance discrimination. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3733–3742
He K, Fan H, Wu Y, Xie S, Girshick R (2020) Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9729–9738
Chen T, Kornblith S, Norouzi M, Hinton G (2020) A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607. PMLR
Wang X, Qi G-J (2022) Contrastive learning with stronger augmentations. IEEE Trans Pattern Anal Mach Intell
Sun B, Li B, Cai S, Yuan, Y, Zhang C (2021) Fsce: Few-shot object detection via contrastive proposal encoding. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7352–7362
Huang L, Dai S, He Z (2022) Few-shot object detection with semantic enhancement and semantic prototype contrastive learning. Knowl-Based Syst 252:109411
Article Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2117– 2125
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European conference on computer vision, pp. 740–755. Springer
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. Adv Neural Inf Process Syst 26
Yang C, Wu W, Wang Y, Zhou H (2022) A novel feature-based model for zero-shot object detection with simulated attributes. Appl Intell 52(6):6905–6914
Article Google Scholar
Huang P, Han J, Cheng D, Zhang D (2022) Robust region feature synthesizer for zero-shot object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7622–7631
Sarma S, Kumar S, Sur A (2022) Resolving semantic confusions for improved zero-shot detection. In: 33rd British machine vision conference(BMVC)

Download references

Acknowledgements

This work was supported by the Ningbo Municipal Natural Science Foundation of China (No. 2022J114) and Innovation Challenge Project of China (Ningbo) (No. 2022T001).

Author information

Authors and Affiliations

Faculty of Electrical Engineering and Computer Science, Ningbo University, Ningbo, 315000, Zhejiang, China
Haohe Li, Chong Wang, Weijie Liu, Yilin Gong & Xinmiao Dai
Information and Control Engineering, Shenzhen Anker Innovation, Shenzhen, 518055, Guangdong, China
Weijie Liu

Authors

Haohe Li
View author publications
You can also search for this author in PubMed Google Scholar
Chong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Weijie Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yilin Gong
View author publications
You can also search for this author in PubMed Google Scholar
Xinmiao Dai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chong Wang.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, H., Wang, C., Liu, W. et al. Zero-shot object detection with contrastive semantic association network. Appl Intell 53, 30056–30068 (2023). https://doi.org/10.1007/s10489-023-05117-y

Download citation

Accepted: 17 October 2023
Published: 09 November 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s10489-023-05117-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Zero-shot object detection with contrastive semantic association network

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

End-to-End Object Detection with Transformers

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Zero-shot object detection with contrastive semantic association network

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

End-to-End Object Detection with Transformers

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation