Skip to main content

Object Detection Algorithm Based on Bimodal Feature Alignment

  • Conference paper
  • First Online:
Pattern Recognition (ACPR 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14406))

Included in the following conference series:

  • 229 Accesses

Abstract

A dual-modal feature alignment based object detection algorithm is proposed for the full fusion of visible and infrared image features. First, we propose a two stream detection model. The algorithm supports simultaneous input of visible and infrared image pairs. Secondly, a gated fusion network is designed, consisting of a dual-modal feature alignment module and a feature fusion module. Medium-term fusion is used, which will be used as the middle layer of the dual-stream backbone network. In particular, the dual-mode feature alignment module extracts detailed information of the dual-mode aligned features by computing a multi-scale dual-mode aligned feature vector. The feature fusion module recalibrates the bimodal fused features and then multiplies them with the bimodal aligned features to achieve cross-modal fusion with joint enhancement of the lower and higher level features. We validate the performance of the proposed algorithm using both the publicly available KAIST pedestrian dataset and a self-built GIR dataset. On the KAIST dataset, the algorithm achieves an accuracy of 77.1%, which is 17.3% and 5.6% better than the accuracy of the benchmark algorithm YOLOv5-s for detecting visible and infrared images alone; on the self-built GIR dataset, the detection accuracy is 91%, which is 1.2% and 14.2% better than the benchmark algorithm for detecting visible and infrared images alone respectively. And the speed meets the real time requirements.

Supported in part by the National Natural Science Foundation of China under Grant 62072370 and in part by the Natural Science Foundation of Shaanxi Province under Grant No. 2023-JC-YB-598.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Zhou, Y., Tuzel, O.: Voxelnet: end-to-end learning for point cloud based 3D object detection. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, pp. 4490–4499 (2018)

    Google Scholar 

  2. Kim, S., Song, W.J., Kim, S.H.: Infrared variation optimized deep convolutional neural network for robust automatic ground target recognition. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, USA, pp. 1–8 (2017)

    Google Scholar 

  3. Girshick, R., Donahue, J., Darrell, T.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)

    Google Scholar 

  4. Valverde, F.R., Hurtado, J.V., Valada, A.: There is more than meets the eye: self-supervised multi-object detection and tracking with sound by distilling multimodal knowledge. In: 2021 IEEE Conference on Computer Vision and Pattern Recognition, pp. 11612–11621 (2021)

    Google Scholar 

  5. Liu, J., Zhang, S., Wang, S.: Multispectral deep neural networks for pedestrian detection. arXiv preprint arXiv:1611.02644 (2016)

  6. Konig, D., Adam, M., Jarvers, C., Layher, G.: Fully convolutional region proposal networks for multispectral person detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, pp. 49–56 (2017)

    Google Scholar 

  7. Pfeuffer, A., Dietmayer, K.: Optimal sensor data fusion architecture for object detection in adverse weather conditions. In: International Conference on Information Fusion, England, UK, pp. 1–8 (2018)

    Google Scholar 

  8. Girshick, R.: Fast R-CNN. In: 2015 IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)

    Google Scholar 

  9. Ren, S., He, K., Girshick, R.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, vol. 28, pp. 91–99 (2015)

    Google Scholar 

  10. Liu, W., et al.: SSD: single shot MultiBox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2

    Chapter  Google Scholar 

  11. Redmon, J., Divvala, S., Girshick, R.: You only look once: unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)

    Google Scholar 

  12. Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)

    Google Scholar 

  13. Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)

  14. Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: Yolov4: optimal speed and accuracy of object detection. arXiv pre4print arXiv:2004.10934 (2020)

  15. YOLOv5. https://github.com/ultralytics/yolov5. Accessed 4 Oct 2022

  16. Law, H., Deng, J.: Cornernet: detecting objects as paired keypoints. In: 2018 European Conference on Computer Vision, pp. 734–750 (2018)

    Google Scholar 

  17. Zhou, X., Wang, D., Krähenbühl, P.: Objects as points. arXiv preprint arXiv:1904.07850 (2019)

  18. Tian, Z., Shen, C., Chen, H.: FCOS: fully convolutional one-stage object detection. In: 2019 IEEE/CVF International Conference on Computer Vision, pp. 9627–9636 (2019)

    Google Scholar 

  19. Devaguptapu, C., Akolekar, N., Sharma, M.: Borrow from anywhere: pseudo multi-modal object detection in thermal imagery. In: 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, USA, pp. 1029–1038 (2019)

    Google Scholar 

  20. Yang, L., Ma, R., Zakhor, A.: Drone object detection using RGB/IR fusion. arXiv preprint arXiv:2201.03786 (2022)

  21. Wang, Q., Chi, Y., Shen, T., Song, J.: Improving RGB-infrared object detection by reducing cross-modality redundancy. Remote Sens. 14(9), 2020–2035 (2022)

    Article  Google Scholar 

  22. Geng, X., Li, M., Liu, W., Zhu, S.: Person tracking by detection using dual visible-infrared cameras. IEEE Internet Things J. 9(22), 23241–23251 (2022)

    Article  Google Scholar 

  23. Zhang, Q., Huang, N., Yao, L., Zhang, D.: RGB-T salient object detection via fusing multi-level CNN features. IEEE Trans. Image Process. 29, 3321–3335 (2019)

    Article  MATH  Google Scholar 

  24. Fang, Q., Han, D., Wang, Z.: Cross-modality fusion transformer for multispectral object detection. arXiv preprint arXiv:2111.00273 (2021)

  25. Zhang, W., Ji, G.P., Wang, Z., Fu, K.: Depth quality-inspired feature manipulation for efficient RGB-D salient object detection. In: The 29th ACM International Conference on Multimedia, Chengdu, China, pp. 731–740 (2021)

    Google Scholar 

  26. Hwang, S., Park, J., Kim, N.: Multispectral pedestrian detection: benchmark dataset and baseline. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1037–1045 (2015)

    Google Scholar 

  27. Li, C., Song, D., Tong, R.: Multispectral pedestrian detection via simultaneous detection and segmentation. arXiv preprint arXiv:1808.04818 (2018)

  28. Li, C., Zhao, N., Lu, Y.: Weighted sparse representation regularized graph learning for RGB-T object tracking. In: 2017 Proceedings of the 25th ACM International Conference on Multimedia, pp. 1856–1864 (2017)

    Google Scholar 

  29. Ge, Z., Liu, S., Wang, F., Li, Z.: Yolox: exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430 (2021)

  30. Sun, Y., Cao, B., Zhu, P., Hu, Q.: Drone-based RGB-infrared cross-modality vehicle detection via uncertainty-aware learning. IEEE Trans. Circuits Syst. Video Technol. 32(10), 6700–6713 (2019)

    Article  Google Scholar 

  31. Wang, Q., Chi, Y., Shen, T., Song, J.: Improving RGB-infrared object detection by reducing cross-modality redundancy. Remote Sens. 14(9) (2020)

    Google Scholar 

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China under grant No. 62072370 and the Natural Science Foundation of Shaanxi Province under grant No. 2023-JC-YB-598.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ying Sun .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sun, Y., Hou, Z., Yang, C., Ma, S., Fan, J. (2023). Object Detection Algorithm Based on Bimodal Feature Alignment. In: Lu, H., Blumenstein, M., Cho, SB., Liu, CL., Yagi, Y., Kamiya, T. (eds) Pattern Recognition. ACPR 2023. Lecture Notes in Computer Science, vol 14406. Springer, Cham. https://doi.org/10.1007/978-3-031-47634-1_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-47634-1_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-47633-4

  • Online ISBN: 978-3-031-47634-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics