Skip to main content
Log in

Zero-shot object detection with contrastive semantic association network

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Zero-shot object detection (ZSD) is dedicated to the task of precisely localizing and identifying unfamiliar objects that have not been encountered before. In this paper, a contrastive semantic association network is proposed to address the knowledge transfer challenge from seen classes to unseen ones in ZSD. It enables efficient information propagation through similarity-based connections, thereby establishing a clearer link between seen and unseen categories. Moreover, a visual-semantic contrastive learning technique is developed to mitigate the node convergence issue caused by the graph structure of the proposed network. By emphasizing the visual and semantic distinctiveness across different categories, the proposed model leverages semantic information and graph structure knowledge to enhance the generalization capability of seen and unseen feature projection. Extensive experiments demonstrate the superior performance of our model compared to other zero-shot object detection methods, showcasing notable improvement in mean average precision (mAP) on the MS-COCO dataset. The code and models are publicly available at: https://github.com/lihh1023/CSA-ZSD/tree/master.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580–587

  2. Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp. 1440–1448

  3. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 28:1–2

    Google Scholar 

  4. He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916

    Article  Google Scholar 

  5. Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp. 2980–2988

  6. Wang H, Peng J, Jiang G, Xu F, Fu X (2021) Discriminative feature and dictionary learning with part-aware model for vehicle re-identification. Neurocomputing 438:55–62

    Article  Google Scholar 

  7. Wang H, Peng J, Chen D, Jiang G, Zhao T, Fu X (2020) Attribute-guided feature learning network for vehicle reidentification. IEEE MultiMedia 27(4):112–121

    Article  Google Scholar 

  8. Wang H, Peng J, Zhao Y, Fu X (2020) Multi-path deep cnns for fine-grained car recognition. IEEE Trans Veh Technol 69(10):10484–10493

    Article  Google Scholar 

  9. Zhu P, Wang H, Saligrama V (2019) Zero shot detection. IEEE Trans Circuits Syst Video Technol 30(4):998–1010

    Article  Google Scholar 

  10. Li Z, Yao L, Zhang X, Wang X, Kanhere S, Zhang H (2019) Zero-shot object detection with textual descriptions. In: Proceedings of the AAAI conference on artificial intelligence, vol. 33, pp. 8690–8697

  11. Bansal A, Sikka K, Sharma G, Chellappa R, Divakaran A (2018) Zero-shot object detection. In: Proceedings of the European conference on computer vision (ECCV), pp. 384–400

  12. Rahman S, Khan S, Porikli F (2018) Zero-shot object detection: Learning to simultaneously recognize and localize novel concepts. In: Asian conference on computer vision, pp. 547–563. Springer

  13. Rahman S, Khan S, Barnes N (2020) Improved visual-semantic alignment for zero-shot object detection. In: Proceedings of the AAAI conference on artificial intelligence, vol. 34, pp. 11932–11939

  14. Yan C, Chang X, Luo M, Liu H, Zhang X, Zheng Q (2022) Semantics-guided contrastive network for zero-shot object detection. IEEE transactions on pattern analysis and machine intelligence

  15. Li Q, Zhang Y, Sun S, Zhao X, Li K, Tan M (2021) Rethinking semantic-visual alignment in zero-shot object detection via a softplus margin focal loss. Neurocomputing 449:117–135

    Article  Google Scholar 

  16. Rahman S, Khan S, Barnes N (2019) Transductive learning for zero-shot object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 6082–6091

  17. Hayat N, Hayat M, Rahman S, Khan S, Zamir SW, Khan FS (2020) Synthesizing the unseen for zero-shot object detection. In: Proceedings of the Asian conference on computer vision

  18. Zhao S, Gao C, Shao Y, Li L, Yu C, Ji Z, Sang N (2020) Gtnet: Generative transfer network for zero-shot object detection. In: Proceedings of the AAAI conference on artificial intelligence, vol. 34, pp. 12967–12974

  19. Zhu, P., Wang, H., Saligrama, V.: Don’t even look once: Synthesizing features for zero-shot detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11693–11702 (2020)

  20. Zhang L, Wang X, Yao L, Wu L, Zheng F (2020) Zero-shot object detection via learning an embedding from semantic space to visual space. In: Twenty-Ninth international joint conference on artificial intelligence and seventeenth pacific rim international conference on artificial intelligence IJCAI-PRICAI-20. International joint conferences on artificial intelligence organization

  21. Mao Q, Wang C, Yu S, Zheng Y, Li Y (2020) Zero-shot object detection with attributes-based category similarity. IEEE Trans Circuits Syst II Express Briefs 67(5):921–925

    Google Scholar 

  22. Nie H, Wang R, Chen X (2022) From node to graph: Joint reasoning on visual-semantic relational graph for zero-shot detection. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp. 1109–1118

  23. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788

  24. Zheng Y, Huang R, Han C, Huang X, Cui L (2020) Background learnable cascade for zero-shot object detection. In: Proceedings of the Asian conference on computer vision

  25. Yan C, Zheng Q, Chang X, Luo M, Yeh C-H, Hauptman AG (2020) Semantics-preserving graph propagation for zero-shot object detection. IEEE Trans Image Process 29:8163–8176

    Article  Google Scholar 

  26. Wang H, Jiang G, Peng J, Deng R, Fu X (2022) Towards adaptive consensus graph: Multi-view clustering via graph collaboration. IEEE Trans Multimed

  27. Scarselli F, Gori M, Tsoi AC, Hagenbuchner M, Monfardini G (2008) The graph neural network model. IEEE transactions on neural networks 20(1):61–80

    Article  Google Scholar 

  28. Welling M, Kipf TN (2016) Semi-supervised classification with graph convolutional networks. In: J. international conference on learning representations (ICLR 2017)

  29. Marino K, Salakhutdinov R, Gupta A (2016) The more you know: using knowledge graphs for image classification. arXiv:1612.04844

  30. Han G, He Y, Huang S, Ma J, Chang S-F (2021) Query adaptive few-shot object detection with heterogeneous graph convolutional networks. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 3263–3272

  31. Chen B, Zhang J, Zhang X, Dong Y, Song J, Zhang P, Xu K, Kharlamov E, Tang J (2022) Gccad: Graph contrastive learning for anomaly detection. IEEE Trans Knowl Data Eng 01:1–14

    Google Scholar 

  32. Wang X, Ye Y, Gupta A (2018) Zero-shot recognition via semantic embeddings and knowledge graphs. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6857–6866

  33. Kampffmeyer M, Chen Y, Liang X, Wang H, Zhang Y, Xing EP (2019) Rethinking knowledge graph propagation for zero-shot learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11487–11496

  34. Wei J, Sun H, Yang Y, Xu X, Li J, Shen HT (2022) Semantic guided knowledge graph for large-scale zero-shot learning. J Vis Commun Image Represent 88:103629

  35. Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06), vol. 2, pp. 1735– 1742 (2006). IEEE

  36. Liu X, Zhang F, Hou Z, Mian L, Wang Z, Zhang J, Tang J (2021) Self-supervised learning: Generative or contrastive. IEEE Trans Knowl Data Eng

  37. Wu Z, Xiong Y, Yu SX, Lin D (2018) Unsupervised feature learning via non-parametric instance discrimination. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3733–3742

  38. He K, Fan H, Wu Y, Xie S, Girshick R (2020) Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9729–9738

  39. Chen T, Kornblith S, Norouzi M, Hinton G (2020) A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607. PMLR

  40. Wang X, Qi G-J (2022) Contrastive learning with stronger augmentations. IEEE Trans Pattern Anal Mach Intell

  41. Sun B, Li B, Cai S, Yuan, Y, Zhang C (2021) Fsce: Few-shot object detection via contrastive proposal encoding. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7352–7362

  42. Huang L, Dai S, He Z (2022) Few-shot object detection with semantic enhancement and semantic prototype contrastive learning. Knowl-Based Syst 252:109411

    Article  Google Scholar 

  43. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778

  44. Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2117– 2125

  45. Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European conference on computer vision, pp. 740–755. Springer

  46. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. Adv Neural Inf Process Syst 26

  47. Yang C, Wu W, Wang Y, Zhou H (2022) A novel feature-based model for zero-shot object detection with simulated attributes. Appl Intell 52(6):6905–6914

    Article  Google Scholar 

  48. Huang P, Han J, Cheng D, Zhang D (2022) Robust region feature synthesizer for zero-shot object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7622–7631

  49. Sarma S, Kumar S, Sur A (2022) Resolving semantic confusions for improved zero-shot detection. In: 33rd British machine vision conference(BMVC)

Download references

Acknowledgements

This work was supported by the Ningbo Municipal Natural Science Foundation of China (No. 2022J114) and Innovation Challenge Project of China (Ningbo) (No. 2022T001).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chong Wang.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, H., Wang, C., Liu, W. et al. Zero-shot object detection with contrastive semantic association network. Appl Intell 53, 30056–30068 (2023). https://doi.org/10.1007/s10489-023-05117-y

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-05117-y

Keywords

Navigation