Skip to main content
Log in

Enhanced feature Fusion structure of YOLO v5 for detecting small defects on metal surfaces

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

To improve the detection ability for small defects on the surface of the metal base of an infrared laser sensor, master the fluctuation and distribution of product quality, and form closed-loop control of production and quality improvement, the advanced You Only Look Once (YOLO) v5s (an improved YOLO model) object detection algorithm was further improved in this study. Specifically, the same-scale feature fusion part that is easily ignored in the structure was strengthened to enhance the network detection performance. In deep learning object detection, the propagation of information between features in the neural network is important. It was often represented by the pyramid features in the neck part of the model to enhance feature fusion. First, this study proposed a cross-convolution feature strengthening connection method combining the backbone and neck, which shortened the path of information propagation and improved the semantic information between feature pyramids. Then, the concat module of the original network was improved, and a new enhanced feature concat module was proposed to enhance the fusion of features at the same scale. The attention modules implemented by combining the convolutional block attention module were integrated into the concat module to enable the network to learn the weights of each channel independently, enhance the information transmission between features, and improve the detection performance of deep learning small objects. Lastly, the K-means + + algorithm was used to optimize the self-made Metal Base dataset of Infrared Laser Sensor (ILS-MB) and generate a new anchor box suitable for small objects in this dataset to improve the matching degree of target objects. With a small increase in computational cost, the improved YOLO v5s algorithm enhanced the accuracy by 3.8% on the ILS-MB dataset and achieved a very significant effect compared with other state-of-the-art detection methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data Availability

Declaration.

The ILS-MB dataset supporting the results of this study is used only under the license of the current study, so the data are not publicly available.

References

  • 1. David G. Lowe (1999) Object recognition from local scale-invariant features. International Conference on Computer Vision (ICCV), pp 1150–1157

  • 2. Navneet Dalal, & Bill Triggs (2005) Histograms of oriented gradients for human detection. Computer Vision and Pattern Recognition (CVPR), pp 886–893

  • 3. Zhenhua Guo, Lei Zhang, & David Zhang (2010) A Completed Modeling of Local Binary Pattern Operator for Texture Classification. IEEE Transactions on Image Processing, 19:1657–1663

  • 4. Ross Girshick, Jeff Donahue, Trevor Darrell, & Jitendra Malik (2014) Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. Computer Vision and Pattern Recognition (CVPR), pp 580–587

  • 5. Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun (2015) Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37:904–1916

  • 6. Ross Girshick (2015) Fast R-CNN. International Conference on Computer Vision (ICCV), pp 1440–1448

  • 7. Shaoqing Ren, Kaiming He, Ross Girshick, & Jian Sun (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Neural Information Processing Systems (NIPS), pp 91–99

  • 8. Corinna Cortes, & Vladimir Vapnik (1995) Support-Vector Networks. Machine Learning, 20(3):273–297

  • 9. Joseph Redmon, Santosh K. Divvala, Ross Girshick, & Ali Farhadi (2016) You Only Look Once: Unified, Real-Time Object Detection. Computer Vision and Pattern Recognition (CVPR), pp 779–788

  • 10. Joseph Redmon, & Ali Farhadi (2017) YOLO9000: Better, Faster, Stronger. Computer Vision and Pattern Recognition (CVPR), pp 6517–6525

  • 11. Joseph Redmon, & Ali Farhadi (2018) YOLOv3: An Incremental Improvement. arXiv: Computer Vision and Pattern Recognition (CVPR), pp 1–6

  • 12. Alexey Bochkovskiy, Chien-Yao Wang, & Hong-Yuan Mark Liao (2020) YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv: Computer Vision and Pattern Recognition (CVPR), pp 198–215

  • 13. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, & Alexander C. Berg (2016) SSD: Single Shot MultiBox Detector European Conference on Computer Vision, pp 21–37

  • 14. Sanghyun Woo, Jongchan Park, Joon-Young Lee, & In So Kweon (2018). CBAM: Convolutional Block Attention Module. European Conference on Computer Vision (ECCV), pp 3–19

  • 15. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, & C. Lawrence Zitnick (2014) Microsoft COCO: Common Objects in Context. European Conference on Computer Vision (ECCV), pp 740–755

  • 16. Mate Kisantal, Zbigniew Wojna, Jakub Murawski, Jacek Naruniec, & Kyunghyun Cho (2019). Augmentation for small object detection

  • 17. Changrui Chen, Yu Zhang, Qingxuan Lv, Shuo Wei, Xiaorui Wang, Xin Sun, & Junyu Dong (2019). RRNet: A Hybrid Detector for Object Detection in Drone-Captured Images. International Conference on Computer Vision (ICCV), pp:100–108

  • 18. Yukang Chen, Peizhen Zhang, Zeming Li, Yanwei Li, Xiangyu Zhang, Gaofeng Meng, Shiming Xiang, Jian Sun, & Jiaya Jia (2020). Stitcher: Feedback-driven Data Provider for Object Detection

  • 19. Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, & Serge Belongie (2017). Feature Pyramid Networks for Object Detection. Computer Vision and Pattern Recognition (CVPR), pp:2117–2125

  • 20. Guimei Cao, Xuemei Xie, Wenzhe Yang, Quan Liao, Guangming Shi, & Jinjian Wu (2018). Feature-fused SSD: fast detection for small objects. International Conference on Graphic and Image Processing (CGIP)

  • 21. Sean Bell, C. Lawrence Zitnick, Kavita Bala, & Ross Girshick (2016). Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks. Computer Vision and Pattern Recognition (CVPR), pp:2847–2883

  • 22. Tao Kong, Anbang Yao, Yurong Chen, & Fuchun Sun (2016). HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection Computer Vision and Pattern Recognition (CVPR), pp:845–853

  • 23. Zhe Chen, Shaoli Huang, & Dacheng Tao (2018). Context Refinement for Object Detection European Conference on Computer Vision (ECCV), pp:71–86

  • 24. Hei Law, & Jia Deng (2018). Cornernet: Detecting objects as paired keypoints European Conference on Computer Vision (ECCV), pp:734–750

  • 25. Karen Simonyan, & Andrew Zisserman (2014). Very Deep Convolutional Networks for Large-Scale Image Recognition. Computer Vision and Pattern Recognition (CVPR)

  • 26. Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun (2016). Deep Residual Learning for Image Recognition. Computer Vision and Pattern Recognition (CVPR), pp 770–778

  • 27. Gao Huang, Zhuang Liu, Laurens van der Maaten, & Kilian Q. Weinberger (2017). Densely Connected Convolutional Networks. Computer Vision and Pattern Recognition (CVPR), pp 2261–2269

  • 28. Andrew Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, M. Andreetto, & Hartwig Adam (2017). MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications arXiv: Computer Vision and Pattern Recognition (CVPR)

  • 29. Mingxing Tan, & Quoc V. Le (2019). EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. International Conference on Machine Learning (ICML), pp. 10691–10700

  • 30. Chien-Yao Wang, Hong-Yuan Mark Liao, Yueh-Hua Wu, Ping-Yang Chen, Jun-Wei Hsieh, & I-Hau Yeh (2020) CSPNet: A New Backbone that can Enhance Learning Capability of CNN. Computer Vision and Pattern Recognition (CVPR), pp 390–391

  • 31. Alejandro Newell, Kaiyu Yang, & Jia Deng (2016). Stacked Hourglass Networks for Human Pose Estimation european conference on computer vision (ECCV), pp 483–499

  • 32. Shu Liu, Lu Qi, Haifang Qin, Jianping Shi, & Jiaya Jia (2018). Path Aggregation Network for Instance Segmentation Computer Vision and Pattern Recognition (CVPR), pp 8759–8768

  • 33. Golnaz Ghiasi, Tsung-Yi Lin, & Quoc V. Le (2019). NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection Computer Vision and Pattern Recognition (CVPR), pp 7036–7045

  • 34. Mingxing Tan, Ruoming Pang, & Quoc V. Le (2020). EfficientDet: Scalable and Efficient Object Detection Computer Vision and Pattern Recognition (CVPR), pp 10778–10787

  • 35. Songtao Liu, Di Huang, & Yunhong Wang (2019). Learning Spatial Fusion for Single-Shot Object Detection. arXiv: Computer Vision and Pattern Recognition (CVPR)

  • 36. Sergey Ioffe, & Christian Szegedy (2015) Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proceedings of the 32nd International Conference on Machine Learning (ICML-15), pp 448–456

  • 37. Patrice Y. Simard, David W. Steinkraus, & John Platt (2003) Best practices for convolutional neural networks applied to visual document analysis. International Conference on Document Analysis and Recognition, IEEE Computer Society, Los Alamitos, pp 958–962

  • 38. Krizhevsky, A. & Hinton, G. (2009). Learning multiple layers of features from tiny images. Handbook of Systemic Autoimmune Diseases, 1(4):7

Download references

Acknowledgements

This work was supported by the Jiangsu Key Laboratory of Advanced Food Manufacturing Equipment and Technology (FMZ201901) and the National Natural Science Foundation of China “Research on bionic chewing robot for physical property detection and evaluation of food materials” (51375209).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jinghu Yu.

Ethics declarations

Conflict of interest

The authors declared that there is no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, X., Liu, J., Zhou, X. et al. Enhanced feature Fusion structure of YOLO v5 for detecting small defects on metal surfaces. Int. J. Mach. Learn. & Cyber. 14, 2041–2051 (2023). https://doi.org/10.1007/s13042-022-01744-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-022-01744-y

Keywords

Navigation