Skip to main content
Log in

MDSSD-MobV2: An embedded deconvolutional multispectral pedestrian detection based on SSD-MobileNetV2

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Multispectral pedestrian detection is known as a promising task in the field of computer vision and deep learning regarding its robustness in challenging conditions like adverse illumination, occlusion, and low-resolution imaging. Both the input data and the network layers can benefit from multispectral fusion. Due to the high computational cost of fusing visible and thermal modalities, it cannot be accomplished efficiently across many tasks in this domain, such as autonomous vehicles, robotic applications, security, and monitoring systems. Despite this fact, most recent efforts have focused on accuracy as a dominant parameter and have paid less attention to the importance of speed. As a solution, this paper proposes a fast Multispectral Deconvolutional MobileNetV2 based Single Shot Detector (MDSSD-MobV2) that effectively balances two crucial goals of accuracy and speed. This lightweight multispectral network is built upon the novel high-resolution MobileNetV2 followed by DSSD auxiliary layers, which fuse visible and thermal feature maps at multiple levels of the network architecture. This research is primarily dependent on developing a framework for a blind assistance detector that is light enough to run on embedded systems with low memory and processing power while still retaining the required accuracy. The evaluation results on the well-known KAIST benchmark show that this method attains sufficient speed on the Nvidia Jetson TX2 while still keeping up with recent solutions with a miss rate margin of less than 1\(\%\) and improving their best speed by more than 3.5× on the non-embedded operating system of the GTX 1080Ti platform.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Code or data availability

https://github.com/angel9785/MDSSD-MOBv2.git

References

  1. Li C, Song D, Tong R et al (2018) Multispectral pedestrian detection via simultaneous detection and segmentation. In: British machine vision conference (BMVC)

  2. Zhang H, Fromont E, Lefevre S et al (2020) Multispectral fusion for object detection with cyclic fuse-and-refine blocks. In: 2020 IEEE international conference on image processing (ICIP), IEEE, pp 276–280

  3. Wolpert A, Teutsch M, Sarfraz MS et al (2020) Anchor-free small-scale multispectral pedestrian detection. arXiv:2008.08418

  4. Chen Y, Shin H (2020) Multispectral image fusion based pedestrian detection using a multilayer fused deconvolutional single-shot detector. JOSA A 37:768–779

    Article  Google Scholar 

  5. Li C, Song D, Tong R et al (2019) Illumination-aware faster R-CNN for robust multispectral pedestrian detection. Pattern Recognit 85:161–171

    Article  Google Scholar 

  6. Liu J, Zhang S, Wang S, Metaxas DN (2016) Multispectral Deep Neural Networks for Pedestrian Detection. In: Proceedings of the british machine vision conference (BMVC). BMVA Press, pp 73.1–73.13

  7. König D, Adam M, Jarvers C et al (2017) Fully convolutional region proposal networks for multispectral person detection. In: 2017 IEEE Conference on computer vision and pattern recognition workshops (CVPRW), pp 243–250

  8. Hwang S, Park J, Kim N et al (2015) Multispectral pedestrian detection: Benchmark dataset and baseline. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 1037–1045

  9. Liu W, Anguelov D, Erhan D et al (2016) SSD: Single shot multibox detector. In: Computer vision – ECCV 2016. Springer International Publishing, pp 21–37

  10. Redmon J, Divvala S, Girshick R et al (2016) You only look once: unified, real-time object detection. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 779–788

  11. Li Z, Zhou F (2017) FSSD: feature fusion single shot multibox detector. arXiv:1712.00960

  12. Girshick R (2015) Fast R-CNN. In: 2015 IEEE International conference on computer vision (ICCV), pp 1440–1448

  13. Girshick R, Donahue J, Darrell T et al (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 580–587

  14. Ren S, He K, Girshick R et al (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39:1137–1149

    Article  Google Scholar 

  15. Fu CY, Liu W, Ranga A et al (2017) DSSD: Deconvolutional single shot detector. arXiv:1701.06659

  16. Zhang S, Wen L, Bian X et al (2018) Single-shot refinement neural network for object detection. In: CVPR

  17. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

  18. He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778

  19. Howard AG, Zhu M, Chen B et al (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861

  20. Howard A, Zhmoginov A, Chen LC et al (2018) Inverted residuals and linear bottlenecks: mobile networks for classification, detection and segmentation. In: CVPR

  21. Wang L, Tong Z, Ji B et al (2021) Tdn: Temporal difference networks for efficient action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1895–1904

  22. Nemati S, Bastanfard A, Asbaghi S (2017) Human activity recognition using bag of feature. J Knowl-Based Eng Innov (JKBEI) 612–620

  23. Surasak T, Takahiro I, Cheng Ch et al (2018) Histogram of oriented gradients for human detection in video. In: 2018 5th International conference on business and industrial research (ICBIR), pp 172–176

  24. Dollár P, Appel R, Belongie S et al (2014) Fast Feature Pyramids for Object Detection. IEEE Trans Pattern Anal Mach Intell 1532–1545

  25. Burger W, Burge MJ (2022) Scale-invariant feature transform (SIFT). Springer International Publishing pp 709–763

  26. Hsu WY, Chen PC (2022) Pedestrian detection using stationary wavelet dilated residual super-resolution. IEEE Trans Instrum Meas 71:1–11

    Google Scholar 

  27. Ouyang W, Wang X (2012) A discriminative deep model for pedestrian detection with occlusion handling. In: 2012 IEEE Conference on computer vision and pattern recognition, pp 3258–3265

  28. Hosang J, Omran M, Benenson R et al (2015) Taking a deeper look at pedestrians. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 4073–4082

  29. Cai Z, Fan Q, Feris RS et al (2016) A unified multi-scale deep convolutional neural network for fast object detection. In: Computer vision–ECCV 2016: 14th European conference, Amsterdam, The Netherlands, October 11–14, 2016, proceedings, Part IV 14, Springer, pp 354–370

  30. Esfandiari N, Bastanfard A (2020) Improving accuracy of pedestrian detection using convolutional neural networks. In: 2020 6th Iranian conference on signal processing and intelligent systems (ICSPIS), IEEE, pp 1–6

  31. Zhang L, Lin L, Liang X et al (2016) Is faster R-CNN doing well for pedestrian detection? In: Computer vision–ECCV 2016: 14th European conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14, Springer, pp 443–457

  32. Du X, El-Khamy M, Lee J et al (2017) Fused DNN: A deep neural network fusion approach to fast and robust pedestrian detection. In: 2017 IEEE winter conference on applications of computer vision (WACV), IEEE, pp 953–961

  33. Brazil G, Yin X, Liu X (2017) Illuminating pedestrians via simultaneous detection and segmentation. In: 2017 IEEE international conference on computer vision (ICCV), pp 4960–4969

  34. Li J, Liang X, Shen S et al (2018) Scale-aware fast r-cnn for pedestrian detection. IEEE Trans Multimed 20(4):985–996

    Google Scholar 

  35. Liu W, Liao S, Ren W et al (2019) High-level semantic feature detection: a new perspective for pedestrian detection. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 5182–5191

  36. Hou YL, Song Y, Hao X et al (2017) Multispectral pedestrian detection based on deep convolutional neural networks. In: 2017 IEEE international conference on signal processing, communications and computing (ICSPCC), pp 1–4

  37. Cao Y, Luo X, Yang J et al (2022) Locality guided cross-modal feature aggregation and pixel-level fusion for multispectral pedestrian detection. Inf Fusion 88:1–11

    Article  Google Scholar 

  38. Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: Computer vision – ECCV 2016. Springer International Publishing, pp 483–499

  39. Sha M, Boukerche A (2022) Performance evaluation of CNN-based pedestrian detectors for autonomous vehicles. Ad Hoc Netw 128:102784

    Article  Google Scholar 

  40. Modhej N, Bastanfard A, Teshnehlab M et al (2020) Pattern separation network based on the hippocampus activity for handwritten recognition. IEEE Access 8:212803–212817

    Article  Google Scholar 

  41. Salehifar H, Dehshibi MM, Bastanfard A (2011) A fast algorithm for detecting, labeling and tracking volleyball players in sport videos. IEEE ICSAP pp 398–401

  42. Paszke A, Gross S, Massa F et al (2019) Pytorch: An imperative style, high-performance deep learning library. Adv Neural Inf Process Syst 32:5–6

    Google Scholar 

  43. Russakovsky O, Deng J, Su H et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115:211–252

    Article  MathSciNet  Google Scholar 

  44. Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp 249–256

  45. Bucher M, Herbin S, Jurie F (2016) Hard negative mining for metric learning based zero-shot classification. In: Computer vision–ECCV 2016, Springer, pp 524–531

  46. Hosang J, Benenson R, Schiele B (2017) Learning non-maximum suppression. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4507–4515

  47. Minoofam SAH, Bastanfard A, Keyvanpour MR (2021) TRCLA: a transfer learning approach to reduce negative transfer for cellular learning automata. IEEE Trans Neural Netw Learn Syst

Download references

Acknowledgements

Not applicable

Funding

The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.

Author information

Authors and Affiliations

Authors

Contributions

Ehsan Fazl-ersi and Hamid Noori conceived of the presented idea and supervised the project. Fereshteh Aghaee developed the theory and performed the modeling. Ehsan Fazl-ersi verified the analytical methods. Hamid Noori encouraged to investigate a real-time and optimized platform. All authors discussed the results and contributed to the final manuscript.

Corresponding author

Correspondence to Ehsan Fazl-Ersi.

Ethics declarations

Ethics approval

The paper reflects the author’s own research and analysis in a truthful and complete manner, which has not been previously published elsewhere. The results are appropriately placed in the context of prior and existing research. All sources used are properly disclosed. All authors have been personally and actively involved in substantial work leading to the paper, and will take public responsibility for its content.

Consent to participate

Not applicable

Consent for publication

Not applicable

Conflicts of interest/Competing interests

The authors certify that there is no actual or potential conflict of interest in relation to this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Aghaee, F., Fazl-Ersi, E. & Noori, H. MDSSD-MobV2: An embedded deconvolutional multispectral pedestrian detection based on SSD-MobileNetV2. Multimed Tools Appl 83, 43801–43829 (2024). https://doi.org/10.1007/s11042-023-17188-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-17188-7

Keywords

Navigation