Abstract
Multispectral pedestrian detection is known as a promising task in the field of computer vision and deep learning regarding its robustness in challenging conditions like adverse illumination, occlusion, and low-resolution imaging. Both the input data and the network layers can benefit from multispectral fusion. Due to the high computational cost of fusing visible and thermal modalities, it cannot be accomplished efficiently across many tasks in this domain, such as autonomous vehicles, robotic applications, security, and monitoring systems. Despite this fact, most recent efforts have focused on accuracy as a dominant parameter and have paid less attention to the importance of speed. As a solution, this paper proposes a fast Multispectral Deconvolutional MobileNetV2 based Single Shot Detector (MDSSD-MobV2) that effectively balances two crucial goals of accuracy and speed. This lightweight multispectral network is built upon the novel high-resolution MobileNetV2 followed by DSSD auxiliary layers, which fuse visible and thermal feature maps at multiple levels of the network architecture. This research is primarily dependent on developing a framework for a blind assistance detector that is light enough to run on embedded systems with low memory and processing power while still retaining the required accuracy. The evaluation results on the well-known KAIST benchmark show that this method attains sufficient speed on the Nvidia Jetson TX2 while still keeping up with recent solutions with a miss rate margin of less than 1\(\%\) and improving their best speed by more than 3.5× on the non-embedded operating system of the GTX 1080Ti platform.
Similar content being viewed by others
Code or data availability
References
Li C, Song D, Tong R et al (2018) Multispectral pedestrian detection via simultaneous detection and segmentation. In: British machine vision conference (BMVC)
Zhang H, Fromont E, Lefevre S et al (2020) Multispectral fusion for object detection with cyclic fuse-and-refine blocks. In: 2020 IEEE international conference on image processing (ICIP), IEEE, pp 276–280
Wolpert A, Teutsch M, Sarfraz MS et al (2020) Anchor-free small-scale multispectral pedestrian detection. arXiv:2008.08418
Chen Y, Shin H (2020) Multispectral image fusion based pedestrian detection using a multilayer fused deconvolutional single-shot detector. JOSA A 37:768–779
Li C, Song D, Tong R et al (2019) Illumination-aware faster R-CNN for robust multispectral pedestrian detection. Pattern Recognit 85:161–171
Liu J, Zhang S, Wang S, Metaxas DN (2016) Multispectral Deep Neural Networks for Pedestrian Detection. In: Proceedings of the british machine vision conference (BMVC). BMVA Press, pp 73.1–73.13
König D, Adam M, Jarvers C et al (2017) Fully convolutional region proposal networks for multispectral person detection. In: 2017 IEEE Conference on computer vision and pattern recognition workshops (CVPRW), pp 243–250
Hwang S, Park J, Kim N et al (2015) Multispectral pedestrian detection: Benchmark dataset and baseline. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 1037–1045
Liu W, Anguelov D, Erhan D et al (2016) SSD: Single shot multibox detector. In: Computer vision – ECCV 2016. Springer International Publishing, pp 21–37
Redmon J, Divvala S, Girshick R et al (2016) You only look once: unified, real-time object detection. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 779–788
Li Z, Zhou F (2017) FSSD: feature fusion single shot multibox detector. arXiv:1712.00960
Girshick R (2015) Fast R-CNN. In: 2015 IEEE International conference on computer vision (ICCV), pp 1440–1448
Girshick R, Donahue J, Darrell T et al (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 580–587
Ren S, He K, Girshick R et al (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39:1137–1149
Fu CY, Liu W, Ranga A et al (2017) DSSD: Deconvolutional single shot detector. arXiv:1701.06659
Zhang S, Wen L, Bian X et al (2018) Single-shot refinement neural network for object detection. In: CVPR
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778
Howard AG, Zhu M, Chen B et al (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861
Howard A, Zhmoginov A, Chen LC et al (2018) Inverted residuals and linear bottlenecks: mobile networks for classification, detection and segmentation. In: CVPR
Wang L, Tong Z, Ji B et al (2021) Tdn: Temporal difference networks for efficient action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1895–1904
Nemati S, Bastanfard A, Asbaghi S (2017) Human activity recognition using bag of feature. J Knowl-Based Eng Innov (JKBEI) 612–620
Surasak T, Takahiro I, Cheng Ch et al (2018) Histogram of oriented gradients for human detection in video. In: 2018 5th International conference on business and industrial research (ICBIR), pp 172–176
Dollár P, Appel R, Belongie S et al (2014) Fast Feature Pyramids for Object Detection. IEEE Trans Pattern Anal Mach Intell 1532–1545
Burger W, Burge MJ (2022) Scale-invariant feature transform (SIFT). Springer International Publishing pp 709–763
Hsu WY, Chen PC (2022) Pedestrian detection using stationary wavelet dilated residual super-resolution. IEEE Trans Instrum Meas 71:1–11
Ouyang W, Wang X (2012) A discriminative deep model for pedestrian detection with occlusion handling. In: 2012 IEEE Conference on computer vision and pattern recognition, pp 3258–3265
Hosang J, Omran M, Benenson R et al (2015) Taking a deeper look at pedestrians. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 4073–4082
Cai Z, Fan Q, Feris RS et al (2016) A unified multi-scale deep convolutional neural network for fast object detection. In: Computer vision–ECCV 2016: 14th European conference, Amsterdam, The Netherlands, October 11–14, 2016, proceedings, Part IV 14, Springer, pp 354–370
Esfandiari N, Bastanfard A (2020) Improving accuracy of pedestrian detection using convolutional neural networks. In: 2020 6th Iranian conference on signal processing and intelligent systems (ICSPIS), IEEE, pp 1–6
Zhang L, Lin L, Liang X et al (2016) Is faster R-CNN doing well for pedestrian detection? In: Computer vision–ECCV 2016: 14th European conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14, Springer, pp 443–457
Du X, El-Khamy M, Lee J et al (2017) Fused DNN: A deep neural network fusion approach to fast and robust pedestrian detection. In: 2017 IEEE winter conference on applications of computer vision (WACV), IEEE, pp 953–961
Brazil G, Yin X, Liu X (2017) Illuminating pedestrians via simultaneous detection and segmentation. In: 2017 IEEE international conference on computer vision (ICCV), pp 4960–4969
Li J, Liang X, Shen S et al (2018) Scale-aware fast r-cnn for pedestrian detection. IEEE Trans Multimed 20(4):985–996
Liu W, Liao S, Ren W et al (2019) High-level semantic feature detection: a new perspective for pedestrian detection. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 5182–5191
Hou YL, Song Y, Hao X et al (2017) Multispectral pedestrian detection based on deep convolutional neural networks. In: 2017 IEEE international conference on signal processing, communications and computing (ICSPCC), pp 1–4
Cao Y, Luo X, Yang J et al (2022) Locality guided cross-modal feature aggregation and pixel-level fusion for multispectral pedestrian detection. Inf Fusion 88:1–11
Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: Computer vision – ECCV 2016. Springer International Publishing, pp 483–499
Sha M, Boukerche A (2022) Performance evaluation of CNN-based pedestrian detectors for autonomous vehicles. Ad Hoc Netw 128:102784
Modhej N, Bastanfard A, Teshnehlab M et al (2020) Pattern separation network based on the hippocampus activity for handwritten recognition. IEEE Access 8:212803–212817
Salehifar H, Dehshibi MM, Bastanfard A (2011) A fast algorithm for detecting, labeling and tracking volleyball players in sport videos. IEEE ICSAP pp 398–401
Paszke A, Gross S, Massa F et al (2019) Pytorch: An imperative style, high-performance deep learning library. Adv Neural Inf Process Syst 32:5–6
Russakovsky O, Deng J, Su H et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115:211–252
Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp 249–256
Bucher M, Herbin S, Jurie F (2016) Hard negative mining for metric learning based zero-shot classification. In: Computer vision–ECCV 2016, Springer, pp 524–531
Hosang J, Benenson R, Schiele B (2017) Learning non-maximum suppression. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4507–4515
Minoofam SAH, Bastanfard A, Keyvanpour MR (2021) TRCLA: a transfer learning approach to reduce negative transfer for cellular learning automata. IEEE Trans Neural Netw Learn Syst
Acknowledgements
Not applicable
Funding
The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.
Author information
Authors and Affiliations
Contributions
Ehsan Fazl-ersi and Hamid Noori conceived of the presented idea and supervised the project. Fereshteh Aghaee developed the theory and performed the modeling. Ehsan Fazl-ersi verified the analytical methods. Hamid Noori encouraged to investigate a real-time and optimized platform. All authors discussed the results and contributed to the final manuscript.
Corresponding author
Ethics declarations
Ethics approval
The paper reflects the author’s own research and analysis in a truthful and complete manner, which has not been previously published elsewhere. The results are appropriately placed in the context of prior and existing research. All sources used are properly disclosed. All authors have been personally and actively involved in substantial work leading to the paper, and will take public responsibility for its content.
Consent to participate
Not applicable
Consent for publication
Not applicable
Conflicts of interest/Competing interests
The authors certify that there is no actual or potential conflict of interest in relation to this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Aghaee, F., Fazl-Ersi, E. & Noori, H. MDSSD-MobV2: An embedded deconvolutional multispectral pedestrian detection based on SSD-MobileNetV2. Multimed Tools Appl 83, 43801–43829 (2024). https://doi.org/10.1007/s11042-023-17188-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-17188-7