YOLO-ERF: lightweight object detector for UAV aerial images

Wang, Xin; He, Ning; Hong, Chen; Sun, Fengxi; Han, Wenjing; Wang, Qi

doi:10.1007/s00530-023-01182-y

YOLO-ERF: lightweight object detector for UAV aerial images

Regular Paper
Published: 19 September 2023

Volume 29, pages 3329–3339, (2023)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

Xin Wang¹,
Ning He¹,
Chen Hong²,
Fengxi Sun¹,
Wenjing Han¹ &
…
Qi Wang²

364 Accesses
1 Citation
Explore all metrics

Abstract

The application of object detection techniques in the field of unmanned aerial vehicles (UAVs) is an important research direction in computer vision. Because object detection in UAV aerial images needs to meet real-time requirements, a challenging problem in this technology is the trade-off between network parameters and detection accuracy. To solve this problem, this paper proposes a lightweight object detector family named YOLO-ERF. First, this paper proposes the effective receptive field (ERF) module, which can increase the convolutional kernel receptive field while preserving local details. The ERF module is then used to design a lightweight backbone to expand the network receptive field without the need for attaching additional context modules after the backbone to expand the receptive field. In addition, the proposed detectors use the ERF module to critically optimize the path aggregation network structure to improve accuracy with reduced network parameters. Finally, a lightweight detection head is proposed to improve small object recognition in complex backgrounds. With these optimizations, the YOLO-ERF models in this paper achieved a better trade-off between accuracy and parameters than other mainstream models, achieving strong results on the VisDrone and COCO datasets. YOLO-ERF-T reduced the number of network parameters by 40.3% when compared with YOLOv7-Tiny while increasing the average accuracy by 2.4% and 1.9%, respectively, in VisDrone and COCO datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DMA-YOLO: multi-scale object detection method with attention mechanism for aerial images

Article 28 September 2023

RSOD: Real-time small object detection algorithm in UAV-based traffic monitoring

Article 29 October 2021

Small object detection model for UAV aerial image based on YOLOv7

Article 29 December 2023

Data availibility

The data that support the findings of this study are available from the corresponding author, Ning He, upon reasonable request.

References

Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28 (2015)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: Ssd: single shot multibox detector. In: Computer vision-ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pp. 21–37. Springer, Cham (2016)
Chapter Google Scholar
Fu, C.-Y., Liu, W., Ranga, A., Tyagi, A., Berg, A.C.: Dssd: deconvolutional single shot detector. arXiv preprint arXiv:1701.06659 (2017)
Li, Z., Zhou, F.: Fssd: feature fusion single shot multibox detector. arXiv preprint arXiv:1712.00960 (2017)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)
Redmon, J., Farhadi, A.: Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Bochkovskiy, A., Wang, C.-Y., Liao, H.-Y.M.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020)
Glenn, J.: YOLOv5 release v6.1. https://github.com/ultralytics/yolov5/releases/tag/v6.1 (2022)
Ge, Z., Liu, S., Wang, F., Li, Z., Sun, J.: Yolox: exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430 (2021)
Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv preprint arXiv:2207.02696 (2022)
Li, C., Li, L., Jiang, H., Weng, K., Geng, Y., Li, L., Ke, Z., Li, Q., Cheng, M., Nie, W., et al.: Yolov6: a single-stage object detection framework for industrial applications. arXiv preprint arXiv:2209.02976 (2022)
Glenn, J.: Ultralytics YOLOv8. https://github.com/ultralytics/ultralytics (2023)
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: common objects in context. In: Computer Vision-ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6–12, 2014, Proceedings, Part V 13, pp. 740–755. Springer, Cham (2014)
Chapter Google Scholar
Yu, G., Chang, Q., Lv, W., Xu, C., Cui, C., Ji, W., Dang, Q., Deng, K., Wang, G., Du, Y., et al.: Pp-picodet: a better real-time object detector on mobile devices. arXiv preprint arXiv:2111.00902 (2021)
Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8759–8768 (2018)
Du, D., Zhu, P., Wen, L., Bian, X., Lin, H., Hu, Q., Peng, T., Zheng, J., Wang, X., Zhang, Y., et al: Visdrone-det2019: The vision meets drone object detection in image challenge results. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (2019)
Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015)
Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFS. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2017)
Article Google Scholar
Wang, P., Chen, P., Yuan, Y., Liu, D., Huang, Z., Hou, X., Cottrell, G.: Understanding convolution for semantic segmentation. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), IEEE, pp. 1451–1460 (2018)
Liu, S., Huang, D., et al: Receptive field block net for accurate and fast object detection. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 385–400 (2018)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
Cai, Z., Vasconcelos, N.: Cascade r-cnn: Delving into high quality object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6154–6162 (2018)
Dai, J., Li, Y., He, K., Sun, J.: R-fcn: Object detection via region-based fully convolutional networks. Adv Neural Inf Process Syst 29 (2016)
Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015)
Article Google Scholar
Wong, A., Famuori, M., Shafiee, M.J., Li, F., Chwyl, B., Chung, J.: Yolo nano: a highly compact you only look once convolutional neural network for object detection. In: 2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing-NeurIPS Edition (EMC2-NIPS), IEEE, pp. 22–25 (2019)
Hu, L., Li, Y.: Micro-yolo: exploring efficient methods to compress CNN based object detection model. In: ICAART (2), pp. 151–158 (2021)
Han, K., Wang, Y., Tian, Q., Guo, J., Xu, C., Xu, C.: Ghostnet: More features from cheap operations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1580–1589 (2020)
Cai, Y., Li, H., Yuan, G., Niu, W., Li, Y., Tang, X., Ren, B., Wang, Y.: Yolobile: real-time object detection on mobile devices via compression-compilation co-design. Proc. AAAI Conf. Artif. Intell. 35, 955–963 (2021)
Google Scholar
Chen, C., Zhang, Y., Lv, Q., Wei, S., Wang, X., Sun, X., Dong, J.: Rrnet: a hybrid detector for object detection in drone-captured images. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (2019)
Zhang, P., Zhong, Y., Li, X.: Slimyolov3: narrower, faster and better for real-time UAV applications. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (2019)
Zhang, X., Izquierdo, E., Chandramouli, K.: Dense and small object detection in UAV vision based on cascade network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (2019)
Wang, H., Wang, Z., Jia, M., Li, A., Feng, T., Zhang, W., Jiao, L.: Spatial attention for multi-scale feature refinement for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (2019)
Zhang, R., Shao, Z., Huang, X., Wang, J., Li, D.: Object detection in UAV images via global density fused convolutional network. Remote Sens. 12(19), 3140 (2020)
Article Google Scholar
Jadhav, A., Mukherjee, P., Kaushik, V., Lall, B.: Aerial multi-object tracking by detection using deep association networks. In: 2020 National Conference on Communications (NCC), IEEE, pp. 1–6 (2020)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Yu, W., Yang, T., Chen, C.: Towards resolving the challenge of long-tail distribution in UAV images for object detection. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3258–3267 (2021)
Tian, G., Liu, J., Yang, W.: A dual neural network for object detection in UAV images. Neurocomputing 443, 292–301 (2021)
Article Google Scholar
Zhang, R., Shao, Z., Huang, X., Wang, J., Wang, Y., Li, D.: Adaptive dense pyramid network for object detection in UAV imagery. Neurocomputing 489, 377–389 (2022)
Article Google Scholar
Li, G., Zhang, J., Zhang, M., Wu, R., Cao, X., Liu, W.: Efficient depthwise separable convolution accelerator for classification and UAV object detection. Neurocomputing 490, 1–16 (2022)
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500 (2017)
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)
Rao, L.: Treenet: a lightweight one-shot aggregation convolutional network. arXiv preprint arXiv:2109.12342 (2021)
Lee, Y., Hwang, J.-w., Lee, S., Bae, Y., Park, J.: An energy and GPU-computation efficient backbone network for real-time object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (2019)
Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Scaled-yolov4: scaling cross stage partial network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13029–13038 (2021)
Wang, C.-Y., Liao, H.-Y.M., Yeh, I.-H.: Designing network design strategies through gradient path analysis. arXiv preprint arXiv:2211.04800 (2022)
Gao, R.: Rethink dilated convolution for real-time semantic segmentation. arXiv preprint arXiv:2111.09957 (2021)
Wu, Y., Chen, Y., Yuan, L., Liu, Z., Wang, L., Li, H., Fu, Y.: Rethinking classification and localization for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10186–10195 (2020)
Song, G., Liu, Y., Wang, X.: Revisiting the sibling head in object detector. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11563–11572 (2020)
Cao, J., Cholakkal, H., Anwer, R.M., Khan, F.S., Pang, Y., Shao, L.: D2det: towards high quality object detection and instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11485–11494 (2020)
Tian, Z., Shen, C., Chen, H., He, T.: Fcos: Fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9627–9636 (2019)
Feng, C., Zhong, Y., Gao, Y., Scott, M.R., Huang, W.: Tood: Task-aligned one-stage object detection. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), IEEE Computer Society, pp. 3490–3499 (2021)
Xu, S., Wang, X., Lv, W., Chang, Q., Cui, C., Deng, K., Wang, G., Dang, Q., Wei, S., Du, Y., et al.: Pp-yoloe: An evolved version of yolo. arXiv preprint arXiv:2203.16250 (2022)
Yu, J., Jiang, Y., Wang, Z., Cao, Z., Huang, T.: Unitbox: An advanced object detection network. In: Proceedings of the 24th ACM International Conference on Multimedia, pp. 516–520 (2016)
Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Netw. 107, 3–11 (2018)
Article Google Scholar
Akyon, F.C., Altinuc, S.O., Temizel, A.: Slicing aided hyper inference and fine-tuning for small object detection. In: 2022 IEEE International Conference on Image Processing (ICIP), IEEE, pp. 966–970 (2022)

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (62272049, 62236006, 62172045), the Key Project of Beijing Municipal Commission of Education (KZ201911417048), the Major Project of Technological Innovation 2030 – ”New Generation Artiffcial Intelligence” (2018AAA0100800), the Science and Technology Project of Beijing Municipal Commission of Education (KM202111417009, KM201811417005, the Academic Research Projects of Beijing Union University (No.ZKZD202301).

Author information

Authors and Affiliations

College of Smart City, Beijing Union University, Beijing, 100101, China
Xin Wang, Ning He, Fengxi Sun & Wenjing Han
College of Robotics, Beijing Union University, Beijing, 100101, China
Chen Hong & Qi Wang

Authors

Xin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ning He
View author publications
You can also search for this author in PubMed Google Scholar
Chen Hong
View author publications
You can also search for this author in PubMed Google Scholar
Fengxi Sun
View author publications
You can also search for this author in PubMed Google Scholar
Wenjing Han
View author publications
You can also search for this author in PubMed Google Scholar
Qi Wang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

XW: Conceptualization, Methodology, Investigation, Formal Analysis, Writing-Original Draft. NH: Project administration, Supervision, Funding acquisition, Writing-Review and Editing. CH: Software, Writing-review and editing. FS: Software, Formal Analysis, Visualization. WH: Visualization, Data Curation. QW: Ablation experiments, Validation.

Corresponding author

Correspondence to Ning He.

Ethics declarations

Conflict of interest

No potential conflict of interest was reported by the authors.

Ethical standards

This article does not contain any studies with human participants or animals performed by any of the ethical standards.

Additional information

Communicated by B. Bao.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, X., He, N., Hong, C. et al. YOLO-ERF: lightweight object detector for UAV aerial images. Multimedia Systems 29, 3329–3339 (2023). https://doi.org/10.1007/s00530-023-01182-y

Download citation

Received: 05 March 2023
Accepted: 03 September 2023
Published: 19 September 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s00530-023-01182-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

YOLO-ERF: lightweight object detector for UAV aerial images

Abstract

Access this article

Similar content being viewed by others

DMA-YOLO: multi-scale object detection method with attention mechanism for aerial images

RSOD: Real-time small object detection algorithm in UAV-based traffic monitoring

Small object detection model for UAV aerial image based on YOLOv7

Data availibility

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical standards

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

YOLO-ERF: lightweight object detector for UAV aerial images

Abstract

Access this article

Similar content being viewed by others

DMA-YOLO: multi-scale object detection method with attention mechanism for aerial images

RSOD: Real-time small object detection algorithm in UAV-based traffic monitoring

Small object detection model for UAV aerial image based on YOLOv7

Data availibility

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical standards

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation