Small target detection algorithm based on attention mechanism and data augmentation

Wang, Jiuxin; Liu, Man; Su, Yaoheng; Yao, Jiahui; Du, Yurong; Zhao, Minghu; Lu, Dingze

doi:10.1007/s11760-024-03046-y

Small target detection algorithm based on attention mechanism and data augmentation

Original Paper
Published: 26 February 2024

Volume 18, pages 3837–3853, (2024)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Jiuxin Wang¹,
Man Liu¹,
Yaoheng Su¹,
Jiahui Yao¹,
Yurong Du¹,
Minghu Zhao¹ &
…
Dingze Lu¹

184 Accesses
Explore all metrics

Abstract

The detection of masks is of great significance to the prevention of occupational diseases such as infectious diseases and dust diseases. For the problems of small target size, large number of targets, and mutual occlusion in mask-wearing detection, a mask-wearing detection algorithm based on improved YOLOv5s is proposed in this paper. First, the ultralightweight attention mechanism module ECA is embedded in the neck layer to improve the accuracy of the model. Second, the influence of different loss functions (GIoU, CIoU, and DIoU) on the improved model is explored, and CIoU is determined as the loss function of the improved model. Besides, the improved model adopted the label smoothing method, which effectively improved the generalization ability of the model and reduced the risk of overfitting. Finally, the influence of data augmentation methods (Mosaic and Mixup) on model performance is discussed, and the optimal weight of data augmentation is determined. The proposed model is tested on the verification set, and the mean average precision (mAP), precision, and recall are 92.1%, 90.3%, and 87.4%, respectively. The mAP of the improved algorithm is 4.4% higher than that of the original algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Fig. 3

Attention-Guided Neural Network for Face Mask Detection

YOLO-ARGhost: a lightweight face mask detection model

Article 29 August 2023

Lightweight Mask Wearing Detection Algorithm Based on Improved YOLOv5

Data availability

The raw/processed data required to reproduce these findings cannot be shared at this time due to technical or time limitations. Please contact the corresponding author for further assistance.

References

Ciotti, M., Ciccozzi, M., Terrinoni, A., et al.: The COVID-19 pandemic. Crit. Rev. Clin. Lab. Sci. 57, 365–388 (2020). https://doi.org/10.1080/10408363.2020.1783198
Article Google Scholar
van der Sande, M., Teunis, P., Sabel, R.: Professional and home-made face masks reduce exposure to respiratory infections among the general population. PLoS ONE 3, e2618 (2008). https://doi.org/10.1371/journal.pone.0002618
Article Google Scholar
Chiriva-Internati, M., Ferrari, R., Prabhakar, M., et al.: The pituitary tumor transforming gene 1 (PTTG-1): an immunological target for multiple myeloma. J. Transl. Med. 6, 15 (2008). https://doi.org/10.1186/1479-5876-6-15
Article Google Scholar
Angen, Ø., Skade, L., Urth, T.R., et al.: Controlling transmission of MRSA to humans during short-term visits to swine farms using dust masks. Front. Microbiol. (2019). https://doi.org/10.3389/fmicb.2018.03361
Article Google Scholar
Ge, X., Cui, K., Ma, H., et al.: Cost-effectiveness of comprehensive preventive measures for coal workers’ pneumoconiosis in China. BMC Health Serv. Res. 22, 266 (2022). https://doi.org/10.1186/s12913-022-07654-7
Article Google Scholar
Betsch, C., Korn, L., Sprengholz, P., et al.: Social and behavioral consequences of mask policies during the COVID-19 pandemic. Proc Natl Acad Sci U S A 117, 21851–21853 (2020). https://doi.org/10.1073/pnas.2011674117
Article Google Scholar
Vibhuti, Jindal N., Singh, H., et al.: Face mask detection in COVID-19: a strategic review. Multimed. Tools Appl. 81(28), 40013–40042 (2022). https://doi.org/10.1007/s11042-022-12999-6
Article Google Scholar
Dong, S., Wang, P., Abbas, K.: A survey on deep learning and its applications. Comput. Sci. Rev. (2021). https://doi.org/10.1016/j.cosrev.2021.100379
Article MathSciNet Google Scholar
Girshick, R., Donahue, J., Darrell, T. et al.: Rich feature hierarchies for accurate object detection and semantic segmentation. Paper presented at the Proceedings of the IEEE conference on computer vision and pattern recognition (2014)..https://doi.org/10.1109/CVPR.2014.81
He, K., Zhang, X., Ren, S., et al.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37, 1904–1916 (2015). https://doi.org/10.1109/TPAMI.2015.2389824
Article Google Scholar
Girshick, R.: Fast r-cnn. Paper presented at the Proceedings of the IEEE international conference on computer vision (2015).https://doi.org/10.1109/ICCV.2015.169
Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 1137–1149 (2017). https://doi.org/10.1109/TPAMI.2016.2577031
Article Google Scholar
Dai, J., Li, Y., He, K., et al.: R-fcn: object detection via region-based fully convolutional networks. Adv. Neural Inform. Process. Syst. (2016). https://doi.org/10.48550/arXiv.1605.06409
Article Google Scholar
He, K., Gkioxari, G., Dollár, P. et al.: Mask r-cnn. Paper presented at the Proceedings of the IEEE international conference on computer vision (2017). https://doi.org/10.48550/arXiv.1703.06870
Redmon, J., Divvala, S., Girshick, R. et al.: You Only Look Once: Unified, Real-Time Object Detection. Paper presented at the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017).https://doi.org/10.1109/CVPR.2016.91
Liu, W., Anguelov, D., Erhan, D. et al.: Ssd: Single shot multibox detector. Paper presented at the European conference on computer vision (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Lin, T.-Y., Goyal, P., Girshick, R. et al.: Focal loss for dense object detection. Paper presented at the Proceedings of the IEEE international conference on computer vision (2017). https://doi.org/10.48550/arXiv.1708.02002
Jiang, M., Fan, X., Yan, H.: Retinamask: a face mask detector, (2020).https://doi.org/10.1109/SMC52423.2021.9659271
Chavda, A., Dsouza, J., Badgujar, S. et al.: Multi-Stage CNN Architecture for Face Mask Detection. Paper presented at the 2021 6th International Conference for Convergence in Technology (I2CT) (2021). https://doi.org/10.1109/i2ct51068.2021.9418207
Xu, M., Wang, H., Yang, S. et al.: Mask wearing detection method based on SSD-Mask algorithm. Paper presented at the 2020 International Conference on Computer Science and Management Technology (ICCSMT) (2020). https://doi.org/10.1109/iccsmt51754.2020.00034
Jiang, X., Gao, T., Zhu, Z., et al.: Real-time face mask detection method based on YOLOv3. Electronics (2021). https://doi.org/10.3390/electronics10070837
Article Google Scholar
Nagrath, P., Jain, R., Madan, A., et al.: SSDMNV2: a real time DNN-based face mask detection system using single shot multibox detector and MobileNetV2. Sustain. Cities Soc. 66, 102692 (2021). https://doi.org/10.1016/j.scs.2020.102692
Article Google Scholar
Wang, Z., Sun, W., Zhu, Q., et al.: Face mask-wearing detection model based on loss function and attention mechanism. Comput. Intell. Neurosci. 2022, 2452291 (2022). https://doi.org/10.1155/2022/2452291
Article Google Scholar
Guo, S., Li, L., Guo, T., et al.: Research on mask-wearing detection algorithm based on improved YOLOv5. Sensors (Basel) (2022). https://doi.org/10.3390/s22134933
Article Google Scholar
Yuan, S., Wang, Y., Liang, T., et al.: Real-time recognition and warning of mask wearing based on improved YOLOv5 R6.1. Int. J. Intell. Syst. 37, 9309–9338 (2022). https://doi.org/10.1002/int.22994
Article Google Scholar
Chen, C., Liu, M. Y., Tuzel, O. et al.: R-CNN for small object detection. Computer Vision–ACCV 2016: 13th Asian Conference on Computer Vision, 2017; 214–230. https://doi.org/10.1007/978-3-319-54193-8_14
Ahmad, T., Ma, Y., Yahya, M., et al.: Object detection through modified YOLO neural network. Sci. Program. 2020, 1–10 (2020). https://doi.org/10.1155/2020/8403262
Article Google Scholar
Kawakami, M., Hirata, K., Furuya, S., et al.: Development of combination methods for detecting malignant uptakes based on physiological uptake detection using object detection with PET-CT MIP images. Front Med (Lausanne) 7, 616746 (2020). https://doi.org/10.3389/fmed.2020.616746
Article Google Scholar
Cao, X., Zhang, F., Yi, C. et al.: Wafer Surface Defect Detection Based On Improved YOLOv3 Network. Paper presented at the 2020 5th International Conference on Mechanical, Control and Computer Engineering (ICMCCE) (2020). https://doi.org/10.1109/icmcce51767.2020.00323
Xie, H., Li, Y., Li, X. et al.: A Method for Surface Defect Detection of Printed Circuit Board Based on Improved YOLOv4. Paper presented at the 2021 IEEE 2nd International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering (ICBAIE) (2021). https://doi.org/10.1109/icbaie52039.2021.9390006
Zhou, Q., Liu, H., Qiu, Y., et al.: Object detection for construction waste based on an improved YOLOv5 model. Sustainability (2022). https://doi.org/10.3390/su15010681
Article Google Scholar
Rodriguez, P., Velazquez, D., Cucurull, G., et al.: Pay attention to the activations: a modular attention mechanism for fine-grained image recognition. IEEE Trans. Multimed. 22, 502–514 (2020). https://doi.org/10.1109/tmm.2019.2928494
Article Google Scholar
Xue, M., Chen, M., Peng, D., et al.: One spatio-temporal sharpening attention mechanism for light-weight YOLO models based on sharpening spatial attention. Sensors (Basel) (2021). https://doi.org/10.3390/s21237949
Article Google Scholar
Huang, L., Xu, L., Wang, Y., et al.: Efficient detection method of pig-posture behavior based on multiple attention mechanism. Comput. Intell. Neurosci. 2022, 1759542 (2022). https://doi.org/10.1155/2022/1759542
Article Google Scholar
Xu, Z., Li, J., Meng, Y., et al.: CAP-YOLO: channel attention based pruning YOLO for coal mine real-time intelligent monitoring. Sensors (Basel) (2022). https://doi.org/10.3390/s22124331
Article Google Scholar
Tan, L., Lv, X., Lian, X., et al.: YOLOv4_Drone: UAV image target detection based on an improved YOLOv4 algorithm. Comput. Electr. Eng. (2021). https://doi.org/10.1016/j.compeleceng.2021.107261
Article Google Scholar
Gong, H., Mu, T., Li, Q., et al.: Swin-transformer-Enabled YOLOv5 with attention mechanism for small object detection on satellite images. Remote Sens. (2022). https://doi.org/10.3390/rs14122861
Article Google Scholar
Shorten, C., Khoshgoftaar, T.M.: A survey on image data augmentation for deep learning. J. Big Data (2019). https://doi.org/10.1186/s40537-019-0197-0
Article Google Scholar
Fangrong, Z., Hao, P., Guochao, Q., et al.: Insulator and burst fault detection using an improved Yolov3 algorithm. J. Sensors 2022, 1–8 (2022). https://doi.org/10.1155/2022/2088937
Article Google Scholar
Chen, Y., Sun, X., Xu, L., et al.: Application of YOLOv4 algorithm for foreign object detection on a belt conveyor in a low-illumination environment. Sensors (Basel) (2022). https://doi.org/10.3390/s22186851
Article Google Scholar
Wang, D., He, D.: Channel pruned YOLO V5s-based deep learning approach for rapid and accurate apple fruitlet detection before fruit thinning. Biosys. Eng. 210, 271–281 (2021). https://doi.org/10.1016/j.biosystemseng.2021.08.015
Article Google Scholar
Wang, Q., Wu, B., Zhu, P. et al.: ECA-Net: Efficient channel attention for deep convolutional neural networks. Paper presented at the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (2020). https://doi.org/10.1109/CVPR42600.2020.01155
Zheng, Z., Wang, P., Ren, D., et al.: Enhancing geometric factors in model learning and inference for object detection and instance segmentation. IEEE Trans. Cybern. 52, 8574–8586 (2021). https://doi.org/10.1109/TCYB.2021.3095305
Article Google Scholar
Zhang, H., Cisse, M., Dauphin, Y.N. et al.: Mixup: beyond empirical risk minimization, arXiv preprint arXiv:1710.09412, (2017). https://doi.org/10.48550/arXiv.1710.09412
Szegedy, C., Vanhoucke, V., Ioffe, S. et al.: Rethinking the Inception Architecture for Computer Vision, IEEE, (2016) 2818–2826. https://doi.org/10.1109/CVPR.2016.308
Jie, H., Li, S., Gang, S.:. Squeeze-and-Excitation Networks. Paper presented at the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2018). https://doi.org/10.1109/CVPR.2018.00745
Rezatofighi, H., Tsoi, N., Gwak, J.Y. et al.: Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression. Paper presented at the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019). https://doi.org/10.1109/CVPR.2019.00075
Wang, Z., Wang, G., Huang, B. et al.: Masked face recognition dataset and application, arXiv preprint arXiv:2003.09093, (2020). https://doi.org/10.48550/arXiv.2003.09093
Woo, S., Park, J., Lee, J.-Y. et al.: Cbam: convolutional block attention module. Paper presented at the Proceedings of the European conference on computer vision (ECCV) (2018). https://doi.org/10.48550/arXiv.1807.06521
Zhang, Y.F., Ren, W., Zhang, Z. et al.: Focal and efficient IOU loss for accurate bounding box regression (2021). https://doi.org/10.48550/arXiv.2101.08158
Gevorgyan, Z.: SIoU loss: more powerful learning for bounding box regression, arXiv preprint arXiv:2205.12740, (2022). https://doi.org/10.48550/arXiv.2205.12740
He, J., Erfani, S., Ma, X. et al.: Alpha-IoU: a family of power intersection over union losses for bounding box regression. arXiv 2021, arXiv preprint arXiv:2110.13675. https://doi.org/10.48550/arXiv.2110.13675

Download references

Acknowledgements

We thank Dr. Hao Hongjuan for helping us to make the weld data set. Dr. Wang Qiuping has provided us with many research foundations, such as plates with welds.

Funding

This work was supported by the Natural Science Foundation of Shaanxi Province, China (grant NO. 2022JM-033), and 2023 Graduate Innovation Fund Project of Xi'an Polytechnic University (grant NO. chx2023026).

Author information

Authors and Affiliations

School of Science, Xi’an Polytechnic University, Xi’an, 710048, China
Jiuxin Wang, Man Liu, Yaoheng Su, Jiahui Yao, Yurong Du, Minghu Zhao & Dingze Lu

Authors

Jiuxin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Man Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yaoheng Su
View author publications
You can also search for this author in PubMed Google Scholar
Jiahui Yao
View author publications
You can also search for this author in PubMed Google Scholar
Yurong Du
View author publications
You can also search for this author in PubMed Google Scholar
Minghu Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Dingze Lu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

ML performed methodology, software, formal analysis, and investigation. JY and JW provided conceptualization, methodology, validation, and supervision. YD and DL did methodology, validation, and supervision. MZ approved validation, resources, supervision, and writing—review and editing. YS carried out supervision, writing—original draft, writing—review & editing, and funding acquisition.

Corresponding author

Correspondence to Yaoheng Su.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships which could have appeared to influence the work reported in this manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Man Liu is same contribution as the first author.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, J., Liu, M., Su, Y. et al. Small target detection algorithm based on attention mechanism and data augmentation. SIViP 18, 3837–3853 (2024). https://doi.org/10.1007/s11760-024-03046-y

Download citation

Received: 24 June 2023
Revised: 16 December 2023
Accepted: 23 January 2024
Published: 26 February 2024
Issue Date: June 2024
DOI: https://doi.org/10.1007/s11760-024-03046-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Small target detection algorithm based on attention mechanism and data augmentation

Abstract

Access this article

Similar content being viewed by others

Attention-Guided Neural Network for Face Mask Detection

YOLO-ARGhost: a lightweight face mask detection model

Lightweight Mask Wearing Detection Algorithm Based on Improved YOLOv5

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Small target detection algorithm based on attention mechanism and data augmentation

Abstract

Access this article

Similar content being viewed by others

Attention-Guided Neural Network for Face Mask Detection

YOLO-ARGhost: a lightweight face mask detection model

Lightweight Mask Wearing Detection Algorithm Based on Improved YOLOv5

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation