Skip to main content
Log in

L-SSD: lightweight SSD target detection based on depth-separable convolution

  • Research
  • Published:
Journal of Real-Time Image Processing Aims and scope Submit manuscript

Abstract

The current target detection algorithm based on deep learning has many redundant convolution calculations, which are difficult to apply to low-energy mobile devices, such as intelligent inspection robots and automatic driving. To solve this problem, we propose a lightweight target detection algorithm, L-SSD, based on depth-separable convolution. First, we chose the lightweight network MobileNetv2 as the backbone feature extraction network, and we proposed an upsampling feature fusion module (UFFM) to fuse the output feature maps of MobileNetv2. Deep semantic information is introduced into the shallow feature map to improve the feature extraction capability while reducing the complexity of the model. Second, we propose a local–global feature extraction module (LGFEM), which uses LGFEM to generate five additional feature layers to expand the feature map’s receptive field and improve the model’s detection accuracy. Then, we use an improved weighted bidirectional feature pyramid (BiFPN) for feature fusion to construct a new feature pyramid that fully utilizes the feature information between different layers. Finally, we propose asymmetric spatial attention (ASA) that enhances the expression ability of the features before BiFPN feature fusion, providing good positional information for the feature pyramid. Experimental results on the PASCAL VOC and MS COCO datasets show that the model parameters and model complexity of L-SSD are reduced by 85.9% and 96.1%, respectively, compared to SSD. A detection speed of 106 frames per second was achieved in NVIDIA GeForce RTX 3060 with detection accuracies of 73.8% and 22.4%, respectively. The optimal balance of model parameters, model complexity, detection accuracy, and speed are achieved.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data availability

Not applicable.

References

  1. Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: Yolov4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020)

  2. Bouguettaya, A., Zarzour, H., Taberkit, A.M., Kechida, A.: A review on early wildfire detection from unmanned aerial vehicles using deep learning-based computer vision algorithms. Signal Process. 190, 108309 (2022)

    Article  Google Scholar 

  3. Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017)

  4. Dewangan, D.K., Sahu, S.P.: Optimized convolutional neural network for road detection with structured contour and spatial information for intelligent vehicle system. Int. J. Pattern Recognit Artif Intell. 36(06), 2252002 (2022)

    Article  Google Scholar 

  5. Dewangan, D.K., Sahu, S.P.: Lane detection in intelligent vehicle system using optimal 2-tier deep convolutional neural network. Multimed. Tools Appl. 82(5), 7293–7317 (2023)

    Article  Google Scholar 

  6. Dewangan, D.K., Sahu, S.P., Sairam, B., Agrawal, A.: Vldnet: Vision-based lane region detection network for intelligent vehicle system using semantic segmentation. Computing 103(12), 2867–2892 (2021)

    Article  MathSciNet  Google Scholar 

  7. Ding, P., Qian, H., Bao, J., Zhou, Y., Yan, S.: L-yolov4: lightweight yolov4 based on modified rfb-s and depthwise separable convolution for multi-target detection in complex scenes. J. Real-Time Image Proc. 20(4), 71 (2023)

    Article  Google Scholar 

  8. Ding, P., Qian, H., Chu, S.: Slimyolov4: lightweight object detector based on yolov4. J. Real-Time Image Proc. 19(3), 487–498 (2022)

    Article  Google Scholar 

  9. Fu, C.Y., Liu, W., Ranga, A., Tyagi, A., Berg, A.C.: Dssd: deconvolutional single shot detector. arXiv preprint arXiv:1701.06659 (2017)

  10. Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)

  11. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 38(1), 142–158 (2015)

    Article  Google Scholar 

  12. Han, G., He, M., Zhao, F., Xu, Z., Zhang, M., Qin, L.: Insulator detection and damage identification based on improved lightweight yolov4 network. Energy Rep. 7, 187–197 (2021)

    Article  Google Scholar 

  13. Han, J., Yang, Y.: L-net: lightweight and fast object detector-based shufflenetv2. J. Real-Time Image Proc. 18(6), 2527–2538 (2021)

    Article  Google Scholar 

  14. Han, K., Wang, Y., Tian, Q., Guo, J., Xu, C., Xu, C.: Ghostnet: More features from cheap operations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and pattern recognition, pp. 1580–1589 (2020)

  15. Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)

  16. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)

  17. Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: Squeezenet: Alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size. arXiv preprint arXiv:1602.07360 (2016)

  18. Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456. PMLR (2015)

  19. Jiang, L., Nie, W., Zhu, J., Gao, X., Lei, B.: Lightweight object detection network model suitable for indoor mobile robots. J. Mech. Sci. Technol. 36(2), 907–920 (2022)

    Article  Google Scholar 

  20. Jiang, Z., Zhao, L., Li, S., Jia, Y.: Real-time object detection method based on improved yolov4-tiny. arXiv preprint arXiv:2011.04244 (2020)

  21. Kuznetsova, A., Maleva, T., Soloviev, V.: Detecting apples in orchards using yolov3. In: International Conference on Computational Science and Its Applications, pp. 923–934. Springer (2020)

  22. Lampert, C.H., Blaschko, M.B., Hofmann, T.: Beyond sliding windows: object localization by efficient subwindow search. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2008)

  23. Li, L., Li, B., Zhou, H.: Lightweight multi-scale network for small object detection. PeerJ Comput. Sci. 8, e1145 (2022)

    Article  Google Scholar 

  24. Li, Y., Wang, J., Lu, L., et al.: A lightweight real-time target detection model for remote sensing images. Adv. Lasers Optoelectron. 58(16), 464–471 (2021)

    Google Scholar 

  25. Li, Z., Zhou, F.: Fssd: feature fusion single shot multibox detector. arXiv preprint arXiv:1712.00960 (2017)

  26. Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)

  27. Liu, S., Huang, D., Wang, Y.: Learning spatial fusion for single-shot object detection. arxiv 2019. arXiv preprint arXiv:1911.09516 (2019)

  28. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: Ssd: single shot multibox detector. In: European Conference on Computer Vision, pp. 21–37. Springer (2016)

  29. Lu, X., Ji, J., Xing, Z., Miao, Q.: Attention and feature fusion ssd for remote sensing object detection. IEEE Trans. Instrum. Meas. 70, 1–9 (2021)

    Article  Google Scholar 

  30. Mnih, V., Heess, N., Graves, A., et al.: Recurrent models of visual attention. arXiv preprint arXiv:1406.6247 (2014)

  31. Naseri, R.A.S., Kurnaz, A., Farhan, H.M.: Optimized face detector-based intelligent face mask detection model in iot using deep learning approach. Appl. Soft Comput. 134, 109933 (2023)

    Article  Google Scholar 

  32. Qian, H., Wang, H.: Lightweight object detection based on super-resolution. In: 2022 China Automation Congress (CAC), pp. 2493–2498. IEEE (2022)

  33. Qian, H., Wang, H., Feng, S., Yan, S.: Fessd: Ssd target detection based on feature fusion and feature enhancement. J. Real-Time Image Proc. 20(1), 2 (2023)

    Article  Google Scholar 

  34. Qin, Z., Xu, Z.D., Sun, Q.C., Poovendran, P., Balamurugan, P.: Investigation of intelligent substation inspection robot by using mobile data. Int. J. Hum. Robot. 20(02n03), 2240003 (2023)

    Article  Google Scholar 

  35. Qin, Z., Zhang, P., Wu, F., Li, X.: Fcanet: frequency channel attention networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 783–792 (2021)

  36. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)

  37. Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)

  38. Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)

  39. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)

    Article  Google Scholar 

  40. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)

  41. Sharif, S., Naqvi, R.A., Biswas, M.: Sagan: adversarial spatial-asymmetric attention for noisy nona-bayer reconstruction. arXiv preprint arXiv:2110.08619 (2021)

  42. Shen, Y., Li, Y., Li, Z.: Application of intelligent inspection robot in coal mine industrial heritage landscape: taking wangshiwa coal mine as an example. Front. Neurorobot. 16, 865146 (2022)

    Article  Google Scholar 

  43. Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: Thirty-first AAAI Conference on Artificial Intelligence (2017)

  44. Tan, M., Pang, R., Le, Q.V.: Efficientdet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision And Pattern Recognition, pp. 10781–10790 (2020)

  45. Tang, Y., Han, K., Guo, J., Xu, C., Xu, C., Wang, Y.: Ghostnetv2: enhance cheap operation with long-range attention. Adv. Neural. Inf. Process. Syst. 35, 9969–9982 (2022)

    Google Scholar 

  46. Vedaldi, A., Gulshan, V., Varma, M., Zisserman, A.: Multiple kernels for object detection. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 606–613. IEEE (2009)

  47. Wang, H., Qian, H., Feng, S., Yan, S.: Calyolov4: lightweight yolov4 target detection based on coordinated attention. J. Supercomput. 79, 18947–18969 (2023)

    Article  Google Scholar 

  48. Wang, Q., Wu, B., Zhu, P., Li, P., Zuo, W., Hu, Q.: Eca-net: Efficient channel attention for deep convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11534–11542 (2020)

  49. Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: Cbam: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018)

  50. Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500 (2017)

  51. Zeng, N., Wu, P., Wang, Z., Li, H., Liu, W., Liu, X.: A small-sized object detection oriented multi-scale feature fusion approach with application to defect detection. IEEE Trans. Instrum. Meas. 71, 1–14 (2022)

    Google Scholar 

  52. Zhang, Q.L., Yang, Y.B.: Sa-net: shuffle attention for deep convolutional neural networks. In: ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2235–2239. IEEE (2021)

  53. Zhang, Y., Lu, Y., Zhu, W., Wei, X., Wei, Z.: Traffic sign detection based on multi-scale feature extraction and cascade feature fusion. J. Supercomput. 79(2), 2137–2152 (2023)

    Article  Google Scholar 

  54. Zhong, X., Wang, M., Liu, W., Yuan, J., Huang, W.: Scpnet: self-constrained parallelism network for keypoint-based lightweight object detection. J. Vis. Commun. Image Represent. 90, 103719 (2023)

    Article  Google Scholar 

  55. Zhou, Y., Qian, H., Ding, P.: Lite-yolov3: a real-time object detector based on multi-scale slice depthwise convolution and lightweight attention mechanism. J. Real-Time Image Proc. 20(6), 1–10 (2023)

    Article  Google Scholar 

  56. Zhu, W., Zhang, H., Eastwood, J., Qi, X., Jia, J., Cao, Y.: Concrete crack detection using lightweight attention feature fusion single shot multibox detector. Knowl.-Based Syst. 261, 110216 (2023)

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by Key-Area Research and Development Program of Guangdong Province under Grant (Funding No. 2020B0909020001) and National Natural Science Foundation of China (Funding No. 61573113).

Author information

Authors and Affiliations

Authors

Contributions

All persons who meet authorship criteria are listed as authors, and all authors certify that they have participated sufficiently in the work to take public responsibility for the content, including participation in the concept, design, analysis, writing, or revision of the manuscript. HW and HQ contribute conception, methodology, design of the study, software, and writing the original draft. SF and WW modify syntax. All authors reviewed the manuscript.

Corresponding author

Correspondence to Huaming Qian.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest. 

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, H., Qian, H., Feng, S. et al. L-SSD: lightweight SSD target detection based on depth-separable convolution. J Real-Time Image Proc 21, 33 (2024). https://doi.org/10.1007/s11554-024-01413-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11554-024-01413-z

Keywords

Navigation