L-SSD: lightweight SSD target detection based on depth-separable convolution

Wang, Huilin; Qian, Huaming; Feng, Shuai; Wang, Wenna

doi:10.1007/s11554-024-01413-z

L-SSD: lightweight SSD target detection based on depth-separable convolution

Research
Published: 16 February 2024

Volume 21, article number 33, (2024)
Cite this article

Journal of Real-Time Image Processing Aims and scope Submit manuscript

Huilin Wang¹,
Huaming Qian¹,
Shuai Feng¹ &
…
Wenna Wang¹

146 Accesses
Explore all metrics

Abstract

The current target detection algorithm based on deep learning has many redundant convolution calculations, which are difficult to apply to low-energy mobile devices, such as intelligent inspection robots and automatic driving. To solve this problem, we propose a lightweight target detection algorithm, L-SSD, based on depth-separable convolution. First, we chose the lightweight network MobileNetv2 as the backbone feature extraction network, and we proposed an upsampling feature fusion module (UFFM) to fuse the output feature maps of MobileNetv2. Deep semantic information is introduced into the shallow feature map to improve the feature extraction capability while reducing the complexity of the model. Second, we propose a local–global feature extraction module (LGFEM), which uses LGFEM to generate five additional feature layers to expand the feature map’s receptive field and improve the model’s detection accuracy. Then, we use an improved weighted bidirectional feature pyramid (BiFPN) for feature fusion to construct a new feature pyramid that fully utilizes the feature information between different layers. Finally, we propose asymmetric spatial attention (ASA) that enhances the expression ability of the features before BiFPN feature fusion, providing good positional information for the feature pyramid. Experimental results on the PASCAL VOC and MS COCO datasets show that the model parameters and model complexity of L-SSD are reduced by 85.9% and 96.1%, respectively, compared to SSD. A detection speed of 106 frames per second was achieved in NVIDIA GeForce RTX 3060 with detection accuracies of 73.8% and 22.4%, respectively. The optimal balance of model parameters, model complexity, detection accuracy, and speed are achieved.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

L-YOLOv4: lightweight YOLOv4 based on modified RFB-s and depthwise separable convolution for multi-target detection in complex scenes

Article 12 June 2023

CALYOLOv4: lightweight YOLOv4 target detection based on coordinated attention

Article 22 May 2023

FESSD:SSD target detection based on feature fusion and feature enhancement

Article 27 January 2023

Data availability

Not applicable.

References

Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: Yolov4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020)
Bouguettaya, A., Zarzour, H., Taberkit, A.M., Kechida, A.: A review on early wildfire detection from unmanned aerial vehicles using deep learning-based computer vision algorithms. Signal Process. 190, 108309 (2022)
Article Google Scholar
Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017)
Dewangan, D.K., Sahu, S.P.: Optimized convolutional neural network for road detection with structured contour and spatial information for intelligent vehicle system. Int. J. Pattern Recognit Artif Intell. 36(06), 2252002 (2022)
Article Google Scholar
Dewangan, D.K., Sahu, S.P.: Lane detection in intelligent vehicle system using optimal 2-tier deep convolutional neural network. Multimed. Tools Appl. 82(5), 7293–7317 (2023)
Article Google Scholar
Dewangan, D.K., Sahu, S.P., Sairam, B., Agrawal, A.: Vldnet: Vision-based lane region detection network for intelligent vehicle system using semantic segmentation. Computing 103(12), 2867–2892 (2021)
Article MathSciNet Google Scholar
Ding, P., Qian, H., Bao, J., Zhou, Y., Yan, S.: L-yolov4: lightweight yolov4 based on modified rfb-s and depthwise separable convolution for multi-target detection in complex scenes. J. Real-Time Image Proc. 20(4), 71 (2023)
Article Google Scholar
Ding, P., Qian, H., Chu, S.: Slimyolov4: lightweight object detector based on yolov4. J. Real-Time Image Proc. 19(3), 487–498 (2022)
Article Google Scholar
Fu, C.Y., Liu, W., Ranga, A., Tyagi, A., Berg, A.C.: Dssd: deconvolutional single shot detector. arXiv preprint arXiv:1701.06659 (2017)
Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 38(1), 142–158 (2015)
Article Google Scholar
Han, G., He, M., Zhao, F., Xu, Z., Zhang, M., Qin, L.: Insulator detection and damage identification based on improved lightweight yolov4 network. Energy Rep. 7, 187–197 (2021)
Article Google Scholar
Han, J., Yang, Y.: L-net: lightweight and fast object detector-based shufflenetv2. J. Real-Time Image Proc. 18(6), 2527–2538 (2021)
Article Google Scholar
Han, K., Wang, Y., Tian, Q., Guo, J., Xu, C., Xu, C.: Ghostnet: More features from cheap operations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and pattern recognition, pp. 1580–1589 (2020)
Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: Squeezenet: Alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size. arXiv preprint arXiv:1602.07360 (2016)
Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456. PMLR (2015)
Jiang, L., Nie, W., Zhu, J., Gao, X., Lei, B.: Lightweight object detection network model suitable for indoor mobile robots. J. Mech. Sci. Technol. 36(2), 907–920 (2022)
Article Google Scholar
Jiang, Z., Zhao, L., Li, S., Jia, Y.: Real-time object detection method based on improved yolov4-tiny. arXiv preprint arXiv:2011.04244 (2020)
Kuznetsova, A., Maleva, T., Soloviev, V.: Detecting apples in orchards using yolov3. In: International Conference on Computational Science and Its Applications, pp. 923–934. Springer (2020)
Lampert, C.H., Blaschko, M.B., Hofmann, T.: Beyond sliding windows: object localization by efficient subwindow search. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2008)
Li, L., Li, B., Zhou, H.: Lightweight multi-scale network for small object detection. PeerJ Comput. Sci. 8, e1145 (2022)
Article Google Scholar
Li, Y., Wang, J., Lu, L., et al.: A lightweight real-time target detection model for remote sensing images. Adv. Lasers Optoelectron. 58(16), 464–471 (2021)
Google Scholar
Li, Z., Zhou, F.: Fssd: feature fusion single shot multibox detector. arXiv preprint arXiv:1712.00960 (2017)
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Liu, S., Huang, D., Wang, Y.: Learning spatial fusion for single-shot object detection. arxiv 2019. arXiv preprint arXiv:1911.09516 (2019)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: Ssd: single shot multibox detector. In: European Conference on Computer Vision, pp. 21–37. Springer (2016)
Lu, X., Ji, J., Xing, Z., Miao, Q.: Attention and feature fusion ssd for remote sensing object detection. IEEE Trans. Instrum. Meas. 70, 1–9 (2021)
Article Google Scholar
Mnih, V., Heess, N., Graves, A., et al.: Recurrent models of visual attention. arXiv preprint arXiv:1406.6247 (2014)
Naseri, R.A.S., Kurnaz, A., Farhan, H.M.: Optimized face detector-based intelligent face mask detection model in iot using deep learning approach. Appl. Soft Comput. 134, 109933 (2023)
Article Google Scholar
Qian, H., Wang, H.: Lightweight object detection based on super-resolution. In: 2022 China Automation Congress (CAC), pp. 2493–2498. IEEE (2022)
Qian, H., Wang, H., Feng, S., Yan, S.: Fessd: Ssd target detection based on feature fusion and feature enhancement. J. Real-Time Image Proc. 20(1), 2 (2023)
Article Google Scholar
Qin, Z., Xu, Z.D., Sun, Q.C., Poovendran, P., Balamurugan, P.: Investigation of intelligent substation inspection robot by using mobile data. Int. J. Hum. Robot. 20(02n03), 2240003 (2023)
Article Google Scholar
Qin, Z., Zhang, P., Wu, F., Li, X.: Fcanet: frequency channel attention networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 783–792 (2021)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)
Article Google Scholar
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)
Sharif, S., Naqvi, R.A., Biswas, M.: Sagan: adversarial spatial-asymmetric attention for noisy nona-bayer reconstruction. arXiv preprint arXiv:2110.08619 (2021)
Shen, Y., Li, Y., Li, Z.: Application of intelligent inspection robot in coal mine industrial heritage landscape: taking wangshiwa coal mine as an example. Front. Neurorobot. 16, 865146 (2022)
Article Google Scholar
Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: Thirty-first AAAI Conference on Artificial Intelligence (2017)
Tan, M., Pang, R., Le, Q.V.: Efficientdet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision And Pattern Recognition, pp. 10781–10790 (2020)
Tang, Y., Han, K., Guo, J., Xu, C., Xu, C., Wang, Y.: Ghostnetv2: enhance cheap operation with long-range attention. Adv. Neural. Inf. Process. Syst. 35, 9969–9982 (2022)
Google Scholar
Vedaldi, A., Gulshan, V., Varma, M., Zisserman, A.: Multiple kernels for object detection. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 606–613. IEEE (2009)
Wang, H., Qian, H., Feng, S., Yan, S.: Calyolov4: lightweight yolov4 target detection based on coordinated attention. J. Supercomput. 79, 18947–18969 (2023)
Article Google Scholar
Wang, Q., Wu, B., Zhu, P., Li, P., Zuo, W., Hu, Q.: Eca-net: Efficient channel attention for deep convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11534–11542 (2020)
Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: Cbam: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018)
Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500 (2017)
Zeng, N., Wu, P., Wang, Z., Li, H., Liu, W., Liu, X.: A small-sized object detection oriented multi-scale feature fusion approach with application to defect detection. IEEE Trans. Instrum. Meas. 71, 1–14 (2022)
Google Scholar
Zhang, Q.L., Yang, Y.B.: Sa-net: shuffle attention for deep convolutional neural networks. In: ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2235–2239. IEEE (2021)
Zhang, Y., Lu, Y., Zhu, W., Wei, X., Wei, Z.: Traffic sign detection based on multi-scale feature extraction and cascade feature fusion. J. Supercomput. 79(2), 2137–2152 (2023)
Article Google Scholar
Zhong, X., Wang, M., Liu, W., Yuan, J., Huang, W.: Scpnet: self-constrained parallelism network for keypoint-based lightweight object detection. J. Vis. Commun. Image Represent. 90, 103719 (2023)
Article Google Scholar
Zhou, Y., Qian, H., Ding, P.: Lite-yolov3: a real-time object detector based on multi-scale slice depthwise convolution and lightweight attention mechanism. J. Real-Time Image Proc. 20(6), 1–10 (2023)
Article Google Scholar
Zhu, W., Zhang, H., Eastwood, J., Qi, X., Jia, J., Cao, Y.: Concrete crack detection using lightweight attention feature fusion single shot multibox detector. Knowl.-Based Syst. 261, 110216 (2023)
Article Google Scholar

Download references

Acknowledgements

This work was supported by Key-Area Research and Development Program of Guangdong Province under Grant (Funding No. 2020B0909020001) and National Natural Science Foundation of China (Funding No. 61573113).

Author information

Authors and Affiliations

College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin, 150001, China
Huilin Wang, Huaming Qian, Shuai Feng & Wenna Wang

Authors

Huilin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Huaming Qian
View author publications
You can also search for this author in PubMed Google Scholar
Shuai Feng
View author publications
You can also search for this author in PubMed Google Scholar
Wenna Wang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All persons who meet authorship criteria are listed as authors, and all authors certify that they have participated sufficiently in the work to take public responsibility for the content, including participation in the concept, design, analysis, writing, or revision of the manuscript. HW and HQ contribute conception, methodology, design of the study, software, and writing the original draft. SF and WW modify syntax. All authors reviewed the manuscript.

Corresponding author

Correspondence to Huaming Qian.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, H., Qian, H., Feng, S. et al. L-SSD: lightweight SSD target detection based on depth-separable convolution. J Real-Time Image Proc 21, 33 (2024). https://doi.org/10.1007/s11554-024-01413-z

Download citation

Received: 06 August 2023
Accepted: 01 January 2024
Published: 16 February 2024
DOI: https://doi.org/10.1007/s11554-024-01413-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

L-SSD: lightweight SSD target detection based on depth-separable convolution

Abstract

Access this article

Similar content being viewed by others

L-YOLOv4: lightweight YOLOv4 based on modified RFB-s and depthwise separable convolution for multi-target detection in complex scenes

CALYOLOv4: lightweight YOLOv4 target detection based on coordinated attention

FESSD:SSD target detection based on feature fusion and feature enhancement

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

L-SSD: lightweight SSD target detection based on depth-separable convolution

Abstract

Access this article

Similar content being viewed by others

L-YOLOv4: lightweight YOLOv4 based on modified RFB-s and depthwise separable convolution for multi-target detection in complex scenes

CALYOLOv4: lightweight YOLOv4 target detection based on coordinated attention

FESSD:SSD target detection based on feature fusion and feature enhancement

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation