MDSSD-MobV2: An embedded deconvolutional multispectral pedestrian detection based on SSD-MobileNetV2

Aghaee, Fereshteh; Fazl-Ersi, Ehsan; Noori, Hamid

doi:10.1007/s11042-023-17188-7

MDSSD-MobV2: An embedded deconvolutional multispectral pedestrian detection based on SSD-MobileNetV2

Published: 16 October 2023

Volume 83, pages 43801–43829, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

125 Accesses
Explore all metrics

Abstract

Multispectral pedestrian detection is known as a promising task in the field of computer vision and deep learning regarding its robustness in challenging conditions like adverse illumination, occlusion, and low-resolution imaging. Both the input data and the network layers can benefit from multispectral fusion. Due to the high computational cost of fusing visible and thermal modalities, it cannot be accomplished efficiently across many tasks in this domain, such as autonomous vehicles, robotic applications, security, and monitoring systems. Despite this fact, most recent efforts have focused on accuracy as a dominant parameter and have paid less attention to the importance of speed. As a solution, this paper proposes a fast Multispectral Deconvolutional MobileNetV2 based Single Shot Detector (MDSSD-MobV2) that effectively balances two crucial goals of accuracy and speed. This lightweight multispectral network is built upon the novel high-resolution MobileNetV2 followed by DSSD auxiliary layers, which fuse visible and thermal feature maps at multiple levels of the network architecture. This research is primarily dependent on developing a framework for a blind assistance detector that is light enough to run on embedded systems with low memory and processing power while still retaining the required accuracy. The evaluation results on the well-known KAIST benchmark show that this method attains sufficient speed on the Nvidia Jetson TX2 while still keeping up with recent solutions with a miss rate margin of less than 1\(\%\) and improving their best speed by more than 3.5× on the non-embedded operating system of the GTX 1080Ti platform.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Autonomous pedestrian detection for crowd surveillance using deep learning framework

Article 12 May 2023

InceptionDepth-wiseYOLOv2: improved implementation of YOLO framework for pedestrian detection

Article 11 May 2022

Real-Time Multispectral Pedestrian Detection with a Single-Pass Deep Neural Network

Code or data availability

https://github.com/angel9785/MDSSD-MOBv2.git

References

Li C, Song D, Tong R et al (2018) Multispectral pedestrian detection via simultaneous detection and segmentation. In: British machine vision conference (BMVC)
Zhang H, Fromont E, Lefevre S et al (2020) Multispectral fusion for object detection with cyclic fuse-and-refine blocks. In: 2020 IEEE international conference on image processing (ICIP), IEEE, pp 276–280
Wolpert A, Teutsch M, Sarfraz MS et al (2020) Anchor-free small-scale multispectral pedestrian detection. arXiv:2008.08418
Chen Y, Shin H (2020) Multispectral image fusion based pedestrian detection using a multilayer fused deconvolutional single-shot detector. JOSA A 37:768–779
Article Google Scholar
Li C, Song D, Tong R et al (2019) Illumination-aware faster R-CNN for robust multispectral pedestrian detection. Pattern Recognit 85:161–171
Article Google Scholar
Liu J, Zhang S, Wang S, Metaxas DN (2016) Multispectral Deep Neural Networks for Pedestrian Detection. In: Proceedings of the british machine vision conference (BMVC). BMVA Press, pp 73.1–73.13
König D, Adam M, Jarvers C et al (2017) Fully convolutional region proposal networks for multispectral person detection. In: 2017 IEEE Conference on computer vision and pattern recognition workshops (CVPRW), pp 243–250
Hwang S, Park J, Kim N et al (2015) Multispectral pedestrian detection: Benchmark dataset and baseline. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 1037–1045
Liu W, Anguelov D, Erhan D et al (2016) SSD: Single shot multibox detector. In: Computer vision – ECCV 2016. Springer International Publishing, pp 21–37
Redmon J, Divvala S, Girshick R et al (2016) You only look once: unified, real-time object detection. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 779–788
Li Z, Zhou F (2017) FSSD: feature fusion single shot multibox detector. arXiv:1712.00960
Girshick R (2015) Fast R-CNN. In: 2015 IEEE International conference on computer vision (ICCV), pp 1440–1448
Girshick R, Donahue J, Darrell T et al (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 580–587
Ren S, He K, Girshick R et al (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39:1137–1149
Article Google Scholar
Fu CY, Liu W, Ranga A et al (2017) DSSD: Deconvolutional single shot detector. arXiv:1701.06659
Zhang S, Wen L, Bian X et al (2018) Single-shot refinement neural network for object detection. In: CVPR
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778
Howard AG, Zhu M, Chen B et al (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861
Howard A, Zhmoginov A, Chen LC et al (2018) Inverted residuals and linear bottlenecks: mobile networks for classification, detection and segmentation. In: CVPR
Wang L, Tong Z, Ji B et al (2021) Tdn: Temporal difference networks for efficient action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1895–1904
Nemati S, Bastanfard A, Asbaghi S (2017) Human activity recognition using bag of feature. J Knowl-Based Eng Innov (JKBEI) 612–620
Surasak T, Takahiro I, Cheng Ch et al (2018) Histogram of oriented gradients for human detection in video. In: 2018 5th International conference on business and industrial research (ICBIR), pp 172–176
Dollár P, Appel R, Belongie S et al (2014) Fast Feature Pyramids for Object Detection. IEEE Trans Pattern Anal Mach Intell 1532–1545
Burger W, Burge MJ (2022) Scale-invariant feature transform (SIFT). Springer International Publishing pp 709–763
Hsu WY, Chen PC (2022) Pedestrian detection using stationary wavelet dilated residual super-resolution. IEEE Trans Instrum Meas 71:1–11
Google Scholar
Ouyang W, Wang X (2012) A discriminative deep model for pedestrian detection with occlusion handling. In: 2012 IEEE Conference on computer vision and pattern recognition, pp 3258–3265
Hosang J, Omran M, Benenson R et al (2015) Taking a deeper look at pedestrians. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 4073–4082
Cai Z, Fan Q, Feris RS et al (2016) A unified multi-scale deep convolutional neural network for fast object detection. In: Computer vision–ECCV 2016: 14th European conference, Amsterdam, The Netherlands, October 11–14, 2016, proceedings, Part IV 14, Springer, pp 354–370
Esfandiari N, Bastanfard A (2020) Improving accuracy of pedestrian detection using convolutional neural networks. In: 2020 6th Iranian conference on signal processing and intelligent systems (ICSPIS), IEEE, pp 1–6
Zhang L, Lin L, Liang X et al (2016) Is faster R-CNN doing well for pedestrian detection? In: Computer vision–ECCV 2016: 14th European conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14, Springer, pp 443–457
Du X, El-Khamy M, Lee J et al (2017) Fused DNN: A deep neural network fusion approach to fast and robust pedestrian detection. In: 2017 IEEE winter conference on applications of computer vision (WACV), IEEE, pp 953–961
Brazil G, Yin X, Liu X (2017) Illuminating pedestrians via simultaneous detection and segmentation. In: 2017 IEEE international conference on computer vision (ICCV), pp 4960–4969
Li J, Liang X, Shen S et al (2018) Scale-aware fast r-cnn for pedestrian detection. IEEE Trans Multimed 20(4):985–996
Google Scholar
Liu W, Liao S, Ren W et al (2019) High-level semantic feature detection: a new perspective for pedestrian detection. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 5182–5191
Hou YL, Song Y, Hao X et al (2017) Multispectral pedestrian detection based on deep convolutional neural networks. In: 2017 IEEE international conference on signal processing, communications and computing (ICSPCC), pp 1–4
Cao Y, Luo X, Yang J et al (2022) Locality guided cross-modal feature aggregation and pixel-level fusion for multispectral pedestrian detection. Inf Fusion 88:1–11
Article Google Scholar
Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: Computer vision – ECCV 2016. Springer International Publishing, pp 483–499
Sha M, Boukerche A (2022) Performance evaluation of CNN-based pedestrian detectors for autonomous vehicles. Ad Hoc Netw 128:102784
Article Google Scholar
Modhej N, Bastanfard A, Teshnehlab M et al (2020) Pattern separation network based on the hippocampus activity for handwritten recognition. IEEE Access 8:212803–212817
Article Google Scholar
Salehifar H, Dehshibi MM, Bastanfard A (2011) A fast algorithm for detecting, labeling and tracking volleyball players in sport videos. IEEE ICSAP pp 398–401
Paszke A, Gross S, Massa F et al (2019) Pytorch: An imperative style, high-performance deep learning library. Adv Neural Inf Process Syst 32:5–6
Google Scholar
Russakovsky O, Deng J, Su H et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115:211–252
Article MathSciNet Google Scholar
Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp 249–256
Bucher M, Herbin S, Jurie F (2016) Hard negative mining for metric learning based zero-shot classification. In: Computer vision–ECCV 2016, Springer, pp 524–531
Hosang J, Benenson R, Schiele B (2017) Learning non-maximum suppression. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4507–4515
Minoofam SAH, Bastanfard A, Keyvanpour MR (2021) TRCLA: a transfer learning approach to reduce negative transfer for cellular learning automata. IEEE Trans Neural Netw Learn Syst

Download references

Acknowledgements

Not applicable

Funding

The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.

Author information

Authors and Affiliations

Department of Electrical Engineering, Ferdowsi University of Mashhad, Mashhad, Iran
Fereshteh Aghaee
Department of Computer Engineering, Ferdowsi University of Mashhad, Mashhad, Iran
Ehsan Fazl-Ersi & Hamid Noori

Authors

Fereshteh Aghaee
View author publications
You can also search for this author in PubMed Google Scholar
Ehsan Fazl-Ersi
View author publications
You can also search for this author in PubMed Google Scholar
Hamid Noori
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Ehsan Fazl-ersi and Hamid Noori conceived of the presented idea and supervised the project. Fereshteh Aghaee developed the theory and performed the modeling. Ehsan Fazl-ersi verified the analytical methods. Hamid Noori encouraged to investigate a real-time and optimized platform. All authors discussed the results and contributed to the final manuscript.

Corresponding author

Correspondence to Ehsan Fazl-Ersi.

Ethics declarations

Ethics approval

The paper reflects the author’s own research and analysis in a truthful and complete manner, which has not been previously published elsewhere. The results are appropriately placed in the context of prior and existing research. All sources used are properly disclosed. All authors have been personally and actively involved in substantial work leading to the paper, and will take public responsibility for its content.

Consent to participate

Not applicable

Consent for publication

Not applicable

Conflicts of interest/Competing interests

The authors certify that there is no actual or potential conflict of interest in relation to this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Aghaee, F., Fazl-Ersi, E. & Noori, H. MDSSD-MobV2: An embedded deconvolutional multispectral pedestrian detection based on SSD-MobileNetV2. Multimed Tools Appl 83, 43801–43829 (2024). https://doi.org/10.1007/s11042-023-17188-7

Download citation

Received: 29 June 2022
Revised: 13 September 2023
Accepted: 19 September 2023
Published: 16 October 2023
Issue Date: May 2024
DOI: https://doi.org/10.1007/s11042-023-17188-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MDSSD-MobV2: An embedded deconvolutional multispectral pedestrian detection based on SSD-MobileNetV2

Abstract

Access this article

Similar content being viewed by others

Autonomous pedestrian detection for crowd surveillance using deep learning framework

InceptionDepth-wiseYOLOv2: improved implementation of YOLO framework for pedestrian detection

Real-Time Multispectral Pedestrian Detection with a Single-Pass Deep Neural Network

Code or data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval

Consent to participate

Consent for publication

Conflicts of interest/Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

MDSSD-MobV2: An embedded deconvolutional multispectral pedestrian detection based on SSD-MobileNetV2

Abstract

Access this article

Similar content being viewed by others

Autonomous pedestrian detection for crowd surveillance using deep learning framework

InceptionDepth-wiseYOLOv2: improved implementation of YOLO framework for pedestrian detection

Real-Time Multispectral Pedestrian Detection with a Single-Pass Deep Neural Network

Code or data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval

Consent to participate

Consent for publication

Conflicts of interest/Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation