Skip to main content

Advertisement

Log in

Refined marine object detector with attention-based spatial pyramid pooling networks and bidirectional feature fusion strategy

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Marine object detection has become increasingly important in intelligent underwater robot. Because of color cast and blur in underwater images, features directly extracted from backbone networks usually lack interesting and discriminative characters, that affects performance on marine object detection. To this end, this paper proposes a novel refined marine object detector with attention-based spatial pyramid pooling networks and bidirectional feature fusion strategy to relieve the weakening of features and address marine object detection issues. Firstly, an attention-based spatial pyramid pooling network named as SA-SPPN is proposed to enrich interesting information and extend receptive field on original features extracted from backbone network. Based on enhanced multiple level features, the bidirectional feature fusion strategy is designed to fuse different level features and generate robust feature maps for detection. Specifically, the top-down connection could transfer semantic information from high-level features to enhance low-level features. The bottom-up pathway could extend resolution of high-level features. Furthermore, the cross-layer connections are integrated into both top-down passway and bottom-up passway to carry out multiple branch fusion. On bounding boxes regression phase, the distance-IoU loss is adopted to improve regression speed and accuracy. Finally, this paper conducts series experiments on underwater image datasets and URPC datasets to detect marine objects. The experimental results reveal that our approach could achieve impressive performance and reach 79.64% mAP on underwater image datasets, 79.31% mAP on URPC2019 datasets and 79.93% mAP on URPC2020 datasets, respectively. For standard object detection, the proposed algorithm also could realize notable performance and get 81.9% mAP on PASCAL VOC datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. Underwater Robot Picking Contest. http://www.cnurpc.org/.

References

  1. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587

  2. Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448

  3. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99

  4. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788

  5. Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271

  6. Wei L, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Alexander CB (2016) Ssd: single shot multibox detector. European conference on computer vision. Springer, New York, pp 21–37

    Google Scholar 

  7. Fu C-Y, Liu W, Ranga A, Tyagi A, Berg Lexander AC (2017) Dssd: deconvolutional single shot detector. arXiv preprint arXiv:1701.06659

  8. Shen Z, Liu Z, Li J, Jiang Y-G, Chen Y, Xue X (2017) Dsod: learning deeply supervised object detectors from scratch. In: Proceedings of the IEEE international conference on computer vision, pp 1919–1927

  9. Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767

  10. Ma X, Jia W, Xue S, Yang J, Zhou C, Sheng QZ, et al (2021) A comprehensive survey on graph anomaly detection with deep learning. IEEE Trans Knowl Data Eng

  11. Liu F, Xue S, Wu J, Zhou C, Hu W, Paris C, Nepal S, Yang J, Yu PS (2020) Deep learning for community detection: progress, challenges and opportunities. arXiv preprint arXiv:2005.08225

  12. Su X, Xue S, Liu F, Wu J, Yang J, Zhou C, Hu W, Paris C, Nepal S, Jin D, et al (2021) A comprehensive survey on community detection with deep learning. arXiv preprint arXiv:2105.12584

  13. Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7794–7803

  14. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)

  15. Woo S, Park J, Lee J, Kweon SI (2018) Cbam: convolutional block attention module. pp 3–19

  16. Wang H, Peng J, Zhao Y, Fu X (2020) Multi-path deep cnns for fine-grained car recognition. IEEE Trans Vehic Technol 99:1

    Article  Google Scholar 

  17. Wang H, Peng J, Chen D, Jiang G, Zhao T, Fu X (2020) Attribute-guided feature learning network for vehicle re-identification. IEEE MultiMedia 27(4):112–121

    Article  Google Scholar 

  18. Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125

  19. Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8759–8768

  20. Ghiasi G, Lin T-Y, Le QV (2019) Nas-fpn: learning scalable feature pyramid architecture for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7036–7045

  21. Tan M, Pang R, Le QV (2020) Efficientdet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10781–10790

  22. He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916

    Article  Google Scholar 

  23. Yu J, Jiang Y, Wang Z, Cao Z, Huang T (2016) Unitbox: an advanced object detection network. In: Proceedings of the 24th ACM international conference on multimedia, pp 516–520

  24. Rezatofighi H, Tsoi N, Gwak JY, Sadeghian A, Reid I, Savarese S (2019) Generalized intersection over union: a metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)

  25. Zheng Z, Wang P, Liu W, Li J, Ye R, Ren D (2020) Distance-iou loss: faster and better learning for bounding box regression. In: The thirty-fourth AAAI conference on artificial intelligence, AAAI 2020, the thirty-second innovative applications of artificial intelligence conference, IAAI 2020, the tenth AAAI symposium on educational advances in artificial intelligence, EAAI 2020, New York, NY, USA, February 7–12, 2020, pp 12993–13000. AAAI Press

  26. Everingham M, Van Gool L, Williams Christopher KI, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338

    Article  Google Scholar 

  27. Russakovsky O, Deng J, Hao S, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vision 115(3):211–252

    Article  MathSciNet  Google Scholar 

  28. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  29. Dai J, Li Y, He K, Sun J (2016) R-fcn: object detection via region-based fully convolutional networks. In: Advances in neural information processing systems, pp 379–387

  30. Gidaris S, Komodakis N (2015) Object detection via a multi-region and semantic segmentation-aware cnn model. In: Proceedings of the IEEE international conference on computer vision, pp 1134–1142

  31. Fengqiang X, Wang H, Peng J, Xianping F (2021) Scale-aware feature pyramid architecture for marine object detection. Neural Comput Appl 33(8):3637–3653

    Article  Google Scholar 

  32. Shen Z, Shi H, Yu J, Phan H, Feris R, Cao L, Liu D, Wang X, Huang T, Savvides M (2017) Improving object detection from scratch via gated feature reuse. arXiv: 1712.00886

  33. Bochkovskiy A, Wang CY, Liao H (2020) Yolov4: optimal speed and accuracy of object detection

  34. Jocher G, et al (2021) yolov5. https://github.com/ultralytics/yolov5

  35. Larochelle H, Hinton G (2010) Learning to combine foveal glimpses with a third-order boltzmann machine. In: Proceedings of the 23rd international conference on neural information processing systems, volume 1, NIPS’10, pp 1243–1251, Red Hook, NY, USA, Curran Associates Inc

  36. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN , Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008

  37. Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3156–3164

  38. Cao Y, Xu J, Lin S, Wei F, Hu H (2019) Gcnet: non-local networks meet squeeze-excitation networks and beyond. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV) workshops

  39. Cai Z, Fan Q, Feris RS, Vasconcelos N (2016) A unified multi-scale deep convolutional neural network for fast object detection. In: Bastian L, Jiri M, Nicu S, Max W (eds) Computer vision: ECCV 2016. Springer, Cham, pp 354–370

    Chapter  Google Scholar 

  40. Wang H, Wang Y, Zhang Z, Fu X, Wang M (2019)Kernelized multiview subspace analysis by self-weighted learning

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China Grant 62176037 and 62002041, by Liaoning Revitalization Talents Program XLYC1908007, by the Dalian Science and Technology Innovation Fund 2021JJ12GX028, by the Liaoning Doctoral Research Start-up Fund Project Grant 2021-BS-075.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xianping Fu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xu, F., Wang, H., Sun, X. et al. Refined marine object detector with attention-based spatial pyramid pooling networks and bidirectional feature fusion strategy. Neural Comput & Applic 34, 14881–14894 (2022). https://doi.org/10.1007/s00521-022-07264-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-022-07264-8

Keywords

Navigation