Skip to main content
Log in

YOLO-ERF: lightweight object detector for UAV aerial images

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

The application of object detection techniques in the field of unmanned aerial vehicles (UAVs) is an important research direction in computer vision. Because object detection in UAV aerial images needs to meet real-time requirements, a challenging problem in this technology is the trade-off between network parameters and detection accuracy. To solve this problem, this paper proposes a lightweight object detector family named YOLO-ERF. First, this paper proposes the effective receptive field (ERF) module, which can increase the convolutional kernel receptive field while preserving local details. The ERF module is then used to design a lightweight backbone to expand the network receptive field without the need for attaching additional context modules after the backbone to expand the receptive field. In addition, the proposed detectors use the ERF module to critically optimize the path aggregation network structure to improve accuracy with reduced network parameters. Finally, a lightweight detection head is proposed to improve small object recognition in complex backgrounds. With these optimizations, the YOLO-ERF models in this paper achieved a better trade-off between accuracy and parameters than other mainstream models, achieving strong results on the VisDrone and COCO datasets. YOLO-ERF-T reduced the number of network parameters by 40.3% when compared with YOLOv7-Tiny while increasing the average accuracy by 2.4% and 1.9%, respectively, in VisDrone and COCO datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data availibility

The data that support the findings of this study are available from the corresponding author, Ning He, upon reasonable request.

References

  1. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)

  2. Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)

  3. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28 (2015)

  4. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: Ssd: single shot multibox detector. In: Computer vision-ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pp. 21–37. Springer, Cham (2016)

    Chapter  Google Scholar 

  5. Fu, C.-Y., Liu, W., Ranga, A., Tyagi, A., Berg, A.C.: Dssd: deconvolutional single shot detector. arXiv preprint arXiv:1701.06659 (2017)

  6. Li, Z., Zhou, F.: Fssd: feature fusion single shot multibox detector. arXiv preprint arXiv:1712.00960 (2017)

  7. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)

  8. Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)

  9. Redmon, J., Farhadi, A.: Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018)

  10. Bochkovskiy, A., Wang, C.-Y., Liao, H.-Y.M.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020)

  11. Glenn, J.: YOLOv5 release v6.1. https://github.com/ultralytics/yolov5/releases/tag/v6.1 (2022)

  12. Ge, Z., Liu, S., Wang, F., Li, Z., Sun, J.: Yolox: exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430 (2021)

  13. Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv preprint arXiv:2207.02696 (2022)

  14. Li, C., Li, L., Jiang, H., Weng, K., Geng, Y., Li, L., Ke, Z., Li, Q., Cheng, M., Nie, W., et al.: Yolov6: a single-stage object detection framework for industrial applications. arXiv preprint arXiv:2209.02976 (2022)

  15. Glenn, J.: Ultralytics YOLOv8. https://github.com/ultralytics/ultralytics (2023)

  16. Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: common objects in context. In: Computer Vision-ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6–12, 2014, Proceedings, Part V 13, pp. 740–755. Springer, Cham (2014)

    Chapter  Google Scholar 

  17. Yu, G., Chang, Q., Lv, W., Xu, C., Cui, C., Ji, W., Dang, Q., Deng, K., Wang, G., Du, Y., et al.: Pp-picodet: a better real-time object detector on mobile devices. arXiv preprint arXiv:2111.00902 (2021)

  18. Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8759–8768 (2018)

  19. Du, D., Zhu, P., Wen, L., Bian, X., Lin, H., Hu, Q., Peng, T., Zheng, J., Wang, X., Zhang, Y., et al: Visdrone-det2019: The vision meets drone object detection in image challenge results. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (2019)

  20. Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015)

  21. Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFS. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2017)

    Article  Google Scholar 

  22. Wang, P., Chen, P., Yuan, Y., Liu, D., Huang, Z., Hou, X., Cottrell, G.: Understanding convolution for semantic segmentation. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), IEEE, pp. 1451–1460 (2018)

  23. Liu, S., Huang, D., et al: Receptive field block net for accurate and fast object detection. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 385–400 (2018)

  24. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)

  25. Cai, Z., Vasconcelos, N.: Cascade r-cnn: Delving into high quality object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6154–6162 (2018)

  26. Dai, J., Li, Y., He, K., Sun, J.: R-fcn: Object detection via region-based fully convolutional networks. Adv Neural Inf Process Syst 29 (2016)

  27. Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)

  28. He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015)

    Article  Google Scholar 

  29. Wong, A., Famuori, M., Shafiee, M.J., Li, F., Chwyl, B., Chung, J.: Yolo nano: a highly compact you only look once convolutional neural network for object detection. In: 2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing-NeurIPS Edition (EMC2-NIPS), IEEE, pp. 22–25 (2019)

  30. Hu, L., Li, Y.: Micro-yolo: exploring efficient methods to compress CNN based object detection model. In: ICAART (2), pp. 151–158 (2021)

  31. Han, K., Wang, Y., Tian, Q., Guo, J., Xu, C., Xu, C.: Ghostnet: More features from cheap operations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1580–1589 (2020)

  32. Cai, Y., Li, H., Yuan, G., Niu, W., Li, Y., Tang, X., Ren, B., Wang, Y.: Yolobile: real-time object detection on mobile devices via compression-compilation co-design. Proc. AAAI Conf. Artif. Intell. 35, 955–963 (2021)

    Google Scholar 

  33. Chen, C., Zhang, Y., Lv, Q., Wei, S., Wang, X., Sun, X., Dong, J.: Rrnet: a hybrid detector for object detection in drone-captured images. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (2019)

  34. Zhang, P., Zhong, Y., Li, X.: Slimyolov3: narrower, faster and better for real-time UAV applications. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (2019)

  35. Zhang, X., Izquierdo, E., Chandramouli, K.: Dense and small object detection in UAV vision based on cascade network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (2019)

  36. Wang, H., Wang, Z., Jia, M., Li, A., Feng, T., Zhang, W., Jiao, L.: Spatial attention for multi-scale feature refinement for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (2019)

  37. Zhang, R., Shao, Z., Huang, X., Wang, J., Li, D.: Object detection in UAV images via global density fused convolutional network. Remote Sens. 12(19), 3140 (2020)

    Article  Google Scholar 

  38. Jadhav, A., Mukherjee, P., Kaushik, V., Lall, B.: Aerial multi-object tracking by detection using deep association networks. In: 2020 National Conference on Communications (NCC), IEEE, pp. 1–6 (2020)

  39. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)

  40. Yu, W., Yang, T., Chen, C.: Towards resolving the challenge of long-tail distribution in UAV images for object detection. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3258–3267 (2021)

  41. Tian, G., Liu, J., Yang, W.: A dual neural network for object detection in UAV images. Neurocomputing 443, 292–301 (2021)

    Article  Google Scholar 

  42. Zhang, R., Shao, Z., Huang, X., Wang, J., Wang, Y., Li, D.: Adaptive dense pyramid network for object detection in UAV imagery. Neurocomputing 489, 377–389 (2022)

    Article  Google Scholar 

  43. Li, G., Zhang, J., Zhang, M., Wu, R., Cao, X., Liu, W.: Efficient depthwise separable convolution accelerator for classification and UAV object detection. Neurocomputing 490, 1–16 (2022)

    Article  Google Scholar 

  44. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

  45. Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500 (2017)

  46. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)

  47. Rao, L.: Treenet: a lightweight one-shot aggregation convolutional network. arXiv preprint arXiv:2109.12342 (2021)

  48. Lee, Y., Hwang, J.-w., Lee, S., Bae, Y., Park, J.: An energy and GPU-computation efficient backbone network for real-time object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (2019)

  49. Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Scaled-yolov4: scaling cross stage partial network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13029–13038 (2021)

  50. Wang, C.-Y., Liao, H.-Y.M., Yeh, I.-H.: Designing network design strategies through gradient path analysis. arXiv preprint arXiv:2211.04800 (2022)

  51. Gao, R.: Rethink dilated convolution for real-time semantic segmentation. arXiv preprint arXiv:2111.09957 (2021)

  52. Wu, Y., Chen, Y., Yuan, L., Liu, Z., Wang, L., Li, H., Fu, Y.: Rethinking classification and localization for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10186–10195 (2020)

  53. Song, G., Liu, Y., Wang, X.: Revisiting the sibling head in object detector. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11563–11572 (2020)

  54. Cao, J., Cholakkal, H., Anwer, R.M., Khan, F.S., Pang, Y., Shao, L.: D2det: towards high quality object detection and instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11485–11494 (2020)

  55. Tian, Z., Shen, C., Chen, H., He, T.: Fcos: Fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9627–9636 (2019)

  56. Feng, C., Zhong, Y., Gao, Y., Scott, M.R., Huang, W.: Tood: Task-aligned one-stage object detection. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), IEEE Computer Society, pp. 3490–3499 (2021)

  57. Xu, S., Wang, X., Lv, W., Chang, Q., Cui, C., Deng, K., Wang, G., Dang, Q., Wei, S., Du, Y., et al.: Pp-yoloe: An evolved version of yolo. arXiv preprint arXiv:2203.16250 (2022)

  58. Yu, J., Jiang, Y., Wang, Z., Cao, Z., Huang, T.: Unitbox: An advanced object detection network. In: Proceedings of the 24th ACM International Conference on Multimedia, pp. 516–520 (2016)

  59. Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Netw. 107, 3–11 (2018)

    Article  Google Scholar 

  60. Akyon, F.C., Altinuc, S.O., Temizel, A.: Slicing aided hyper inference and fine-tuning for small object detection. In: 2022 IEEE International Conference on Image Processing (ICIP), IEEE, pp. 966–970 (2022)

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (62272049, 62236006, 62172045), the Key Project of Beijing Municipal Commission of Education (KZ201911417048), the Major Project of Technological Innovation 2030 – ”New Generation Artiffcial Intelligence” (2018AAA0100800), the Science and Technology Project of Beijing Municipal Commission of Education (KM202111417009, KM201811417005, the Academic Research Projects of Beijing Union University (No.ZKZD202301).

Author information

Authors and Affiliations

Authors

Contributions

XW: Conceptualization, Methodology, Investigation, Formal Analysis, Writing-Original Draft. NH: Project administration, Supervision, Funding acquisition, Writing-Review and Editing. CH: Software, Writing-review and editing. FS: Software, Formal Analysis, Visualization. WH: Visualization, Data Curation. QW: Ablation experiments, Validation.

Corresponding author

Correspondence to Ning He.

Ethics declarations

Conflict of interest

No potential conflict of interest was reported by the authors.

Ethical standards

This article does not contain any studies with human participants or animals performed by any of the ethical standards.

Additional information

Communicated by B. Bao.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, X., He, N., Hong, C. et al. YOLO-ERF: lightweight object detector for UAV aerial images. Multimedia Systems 29, 3329–3339 (2023). https://doi.org/10.1007/s00530-023-01182-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-023-01182-y

Keywords

Navigation