Object Detection Algorithm Based on Bimodal Feature Alignment

Sun, Ying; Hou, Zhiqiang; Yang, Chen; Ma, Sugang; Fan, Jiulun

doi:10.1007/978-3-031-47634-1_30

Ying Sun^13,14,
Zhiqiang Hou^13,14,
Chen Yang^13,14,
Sugang Ma¹⁵ &
…
Jiulun Fan¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14406))

Included in the following conference series:

Asian Conference on Pattern Recognition

229 Accesses

Abstract

A dual-modal feature alignment based object detection algorithm is proposed for the full fusion of visible and infrared image features. First, we propose a two stream detection model. The algorithm supports simultaneous input of visible and infrared image pairs. Secondly, a gated fusion network is designed, consisting of a dual-modal feature alignment module and a feature fusion module. Medium-term fusion is used, which will be used as the middle layer of the dual-stream backbone network. In particular, the dual-mode feature alignment module extracts detailed information of the dual-mode aligned features by computing a multi-scale dual-mode aligned feature vector. The feature fusion module recalibrates the bimodal fused features and then multiplies them with the bimodal aligned features to achieve cross-modal fusion with joint enhancement of the lower and higher level features. We validate the performance of the proposed algorithm using both the publicly available KAIST pedestrian dataset and a self-built GIR dataset. On the KAIST dataset, the algorithm achieves an accuracy of 77.1%, which is 17.3% and 5.6% better than the accuracy of the benchmark algorithm YOLOv5-s for detecting visible and infrared images alone; on the self-built GIR dataset, the detection accuracy is 91%, which is 1.2% and 14.2% better than the benchmark algorithm for detecting visible and infrared images alone respectively. And the speed meets the real time requirements.

Supported in part by the National Natural Science Foundation of China under Grant 62072370 and in part by the Natural Science Foundation of Shaanxi Province under Grant No. 2023-JC-YB-598.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Zhou, Y., Tuzel, O.: Voxelnet: end-to-end learning for point cloud based 3D object detection. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, pp. 4490–4499 (2018)
Google Scholar
Kim, S., Song, W.J., Kim, S.H.: Infrared variation optimized deep convolutional neural network for robust automatic ground target recognition. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, USA, pp. 1–8 (2017)
Google Scholar
Girshick, R., Donahue, J., Darrell, T.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
Google Scholar
Valverde, F.R., Hurtado, J.V., Valada, A.: There is more than meets the eye: self-supervised multi-object detection and tracking with sound by distilling multimodal knowledge. In: 2021 IEEE Conference on Computer Vision and Pattern Recognition, pp. 11612–11621 (2021)
Google Scholar
Liu, J., Zhang, S., Wang, S.: Multispectral deep neural networks for pedestrian detection. arXiv preprint arXiv:1611.02644 (2016)
Konig, D., Adam, M., Jarvers, C., Layher, G.: Fully convolutional region proposal networks for multispectral person detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, pp. 49–56 (2017)
Google Scholar
Pfeuffer, A., Dietmayer, K.: Optimal sensor data fusion architecture for object detection in adverse weather conditions. In: International Conference on Information Fusion, England, UK, pp. 1–8 (2018)
Google Scholar
Girshick, R.: Fast R-CNN. In: 2015 IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Google Scholar
Ren, S., He, K., Girshick, R.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, vol. 28, pp. 91–99 (2015)
Google Scholar
Liu, W., et al.: SSD: single shot MultiBox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Redmon, J., Divvala, S., Girshick, R.: You only look once: unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Google Scholar
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)
Google Scholar
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: Yolov4: optimal speed and accuracy of object detection. arXiv pre4print arXiv:2004.10934 (2020)
YOLOv5. https://github.com/ultralytics/yolov5. Accessed 4 Oct 2022
Law, H., Deng, J.: Cornernet: detecting objects as paired keypoints. In: 2018 European Conference on Computer Vision, pp. 734–750 (2018)
Google Scholar
Zhou, X., Wang, D., Krähenbühl, P.: Objects as points. arXiv preprint arXiv:1904.07850 (2019)
Tian, Z., Shen, C., Chen, H.: FCOS: fully convolutional one-stage object detection. In: 2019 IEEE/CVF International Conference on Computer Vision, pp. 9627–9636 (2019)
Google Scholar
Devaguptapu, C., Akolekar, N., Sharma, M.: Borrow from anywhere: pseudo multi-modal object detection in thermal imagery. In: 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, USA, pp. 1029–1038 (2019)
Google Scholar
Yang, L., Ma, R., Zakhor, A.: Drone object detection using RGB/IR fusion. arXiv preprint arXiv:2201.03786 (2022)
Wang, Q., Chi, Y., Shen, T., Song, J.: Improving RGB-infrared object detection by reducing cross-modality redundancy. Remote Sens. 14(9), 2020–2035 (2022)
Article Google Scholar
Geng, X., Li, M., Liu, W., Zhu, S.: Person tracking by detection using dual visible-infrared cameras. IEEE Internet Things J. 9(22), 23241–23251 (2022)
Article Google Scholar
Zhang, Q., Huang, N., Yao, L., Zhang, D.: RGB-T salient object detection via fusing multi-level CNN features. IEEE Trans. Image Process. 29, 3321–3335 (2019)
Article MATH Google Scholar
Fang, Q., Han, D., Wang, Z.: Cross-modality fusion transformer for multispectral object detection. arXiv preprint arXiv:2111.00273 (2021)
Zhang, W., Ji, G.P., Wang, Z., Fu, K.: Depth quality-inspired feature manipulation for efficient RGB-D salient object detection. In: The 29th ACM International Conference on Multimedia, Chengdu, China, pp. 731–740 (2021)
Google Scholar
Hwang, S., Park, J., Kim, N.: Multispectral pedestrian detection: benchmark dataset and baseline. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1037–1045 (2015)
Google Scholar
Li, C., Song, D., Tong, R.: Multispectral pedestrian detection via simultaneous detection and segmentation. arXiv preprint arXiv:1808.04818 (2018)
Li, C., Zhao, N., Lu, Y.: Weighted sparse representation regularized graph learning for RGB-T object tracking. In: 2017 Proceedings of the 25th ACM International Conference on Multimedia, pp. 1856–1864 (2017)
Google Scholar
Ge, Z., Liu, S., Wang, F., Li, Z.: Yolox: exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430 (2021)
Sun, Y., Cao, B., Zhu, P., Hu, Q.: Drone-based RGB-infrared cross-modality vehicle detection via uncertainty-aware learning. IEEE Trans. Circuits Syst. Video Technol. 32(10), 6700–6713 (2019)
Article Google Scholar
Wang, Q., Chi, Y., Shen, T., Song, J.: Improving RGB-infrared object detection by reducing cross-modality redundancy. Remote Sens. 14(9) (2020)
Google Scholar

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China under grant No. 62072370 and the Natural Science Foundation of Shaanxi Province under grant No. 2023-JC-YB-598.

Author information

Authors and Affiliations

School of Computer Science and Technology, Xi’an University of Posts and Telecommunications, Xi’an, 710121, China
Ying Sun, Zhiqiang Hou, Chen Yang & Jiulun Fan
Key Laboratory of Network Data Analysis and Intelligent Processing of Shaanxi Province, Xi’an, 710121, China
Ying Sun, Zhiqiang Hou & Chen Yang
School of Communications and Information Engineering, Xi’an University of Posts and Telecommunications, Xi’an, 710121, China
Sugang Ma

Authors

Ying Sun
View author publications
You can also search for this author in PubMed Google Scholar
Zhiqiang Hou
View author publications
You can also search for this author in PubMed Google Scholar
Chen Yang
View author publications
You can also search for this author in PubMed Google Scholar
Sugang Ma
View author publications
You can also search for this author in PubMed Google Scholar
Jiulun Fan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ying Sun .

Editor information

Editors and Affiliations

Kyushu Institute of Technology, Kitakyushu, Fukuoka, Japan
Huimin Lu
The University of Sydney, Sydney, NSW, Australia
Michael Blumenstein
Yonsei University, Seoul, Korea (Republic of)
Sung-Bae Cho
Chinese Academy of Sciences, Beijing, China
Cheng-Lin Liu
Osaka University, Osaka, Ibaraki, Japan
Yasushi Yagi
Kyushu Institute of Technology, Kitakyushu, Japan
Tohru Kamiya

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sun, Y., Hou, Z., Yang, C., Ma, S., Fan, J. (2023). Object Detection Algorithm Based on Bimodal Feature Alignment. In: Lu, H., Blumenstein, M., Cho, SB., Liu, CL., Yagi, Y., Kamiya, T. (eds) Pattern Recognition. ACPR 2023. Lecture Notes in Computer Science, vol 14406. Springer, Cham. https://doi.org/10.1007/978-3-031-47634-1_30

Download citation

DOI: https://doi.org/10.1007/978-3-031-47634-1_30
Published: 05 November 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-47633-4
Online ISBN: 978-3-031-47634-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Object Detection Algorithm Based on Bimodal Feature Alignment