Abstract
Marine object detection has become increasingly important in intelligent underwater robot. Because of color cast and blur in underwater images, features directly extracted from backbone networks usually lack interesting and discriminative characters, that affects performance on marine object detection. To this end, this paper proposes a novel refined marine object detector with attention-based spatial pyramid pooling networks and bidirectional feature fusion strategy to relieve the weakening of features and address marine object detection issues. Firstly, an attention-based spatial pyramid pooling network named as SA-SPPN is proposed to enrich interesting information and extend receptive field on original features extracted from backbone network. Based on enhanced multiple level features, the bidirectional feature fusion strategy is designed to fuse different level features and generate robust feature maps for detection. Specifically, the top-down connection could transfer semantic information from high-level features to enhance low-level features. The bottom-up pathway could extend resolution of high-level features. Furthermore, the cross-layer connections are integrated into both top-down passway and bottom-up passway to carry out multiple branch fusion. On bounding boxes regression phase, the distance-IoU loss is adopted to improve regression speed and accuracy. Finally, this paper conducts series experiments on underwater image datasets and URPC datasets to detect marine objects. The experimental results reveal that our approach could achieve impressive performance and reach 79.64% mAP on underwater image datasets, 79.31% mAP on URPC2019 datasets and 79.93% mAP on URPC2020 datasets, respectively. For standard object detection, the proposed algorithm also could realize notable performance and get 81.9% mAP on PASCAL VOC datasets.
Similar content being viewed by others
Notes
Underwater Robot Picking Contest. http://www.cnurpc.org/.
References
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271
Wei L, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Alexander CB (2016) Ssd: single shot multibox detector. European conference on computer vision. Springer, New York, pp 21–37
Fu C-Y, Liu W, Ranga A, Tyagi A, Berg Lexander AC (2017) Dssd: deconvolutional single shot detector. arXiv preprint arXiv:1701.06659
Shen Z, Liu Z, Li J, Jiang Y-G, Chen Y, Xue X (2017) Dsod: learning deeply supervised object detectors from scratch. In: Proceedings of the IEEE international conference on computer vision, pp 1919–1927
Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767
Ma X, Jia W, Xue S, Yang J, Zhou C, Sheng QZ, et al (2021) A comprehensive survey on graph anomaly detection with deep learning. IEEE Trans Knowl Data Eng
Liu F, Xue S, Wu J, Zhou C, Hu W, Paris C, Nepal S, Yang J, Yu PS (2020) Deep learning for community detection: progress, challenges and opportunities. arXiv preprint arXiv:2005.08225
Su X, Xue S, Liu F, Wu J, Yang J, Zhou C, Hu W, Paris C, Nepal S, Jin D, et al (2021) A comprehensive survey on community detection with deep learning. arXiv preprint arXiv:2105.12584
Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7794–7803
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
Woo S, Park J, Lee J, Kweon SI (2018) Cbam: convolutional block attention module. pp 3–19
Wang H, Peng J, Zhao Y, Fu X (2020) Multi-path deep cnns for fine-grained car recognition. IEEE Trans Vehic Technol 99:1
Wang H, Peng J, Chen D, Jiang G, Zhao T, Fu X (2020) Attribute-guided feature learning network for vehicle re-identification. IEEE MultiMedia 27(4):112–121
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125
Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8759–8768
Ghiasi G, Lin T-Y, Le QV (2019) Nas-fpn: learning scalable feature pyramid architecture for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7036–7045
Tan M, Pang R, Le QV (2020) Efficientdet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10781–10790
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916
Yu J, Jiang Y, Wang Z, Cao Z, Huang T (2016) Unitbox: an advanced object detection network. In: Proceedings of the 24th ACM international conference on multimedia, pp 516–520
Rezatofighi H, Tsoi N, Gwak JY, Sadeghian A, Reid I, Savarese S (2019) Generalized intersection over union: a metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Zheng Z, Wang P, Liu W, Li J, Ye R, Ren D (2020) Distance-iou loss: faster and better learning for bounding box regression. In: The thirty-fourth AAAI conference on artificial intelligence, AAAI 2020, the thirty-second innovative applications of artificial intelligence conference, IAAI 2020, the tenth AAAI symposium on educational advances in artificial intelligence, EAAI 2020, New York, NY, USA, February 7–12, 2020, pp 12993–13000. AAAI Press
Everingham M, Van Gool L, Williams Christopher KI, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338
Russakovsky O, Deng J, Hao S, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vision 115(3):211–252
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Dai J, Li Y, He K, Sun J (2016) R-fcn: object detection via region-based fully convolutional networks. In: Advances in neural information processing systems, pp 379–387
Gidaris S, Komodakis N (2015) Object detection via a multi-region and semantic segmentation-aware cnn model. In: Proceedings of the IEEE international conference on computer vision, pp 1134–1142
Fengqiang X, Wang H, Peng J, Xianping F (2021) Scale-aware feature pyramid architecture for marine object detection. Neural Comput Appl 33(8):3637–3653
Shen Z, Shi H, Yu J, Phan H, Feris R, Cao L, Liu D, Wang X, Huang T, Savvides M (2017) Improving object detection from scratch via gated feature reuse. arXiv: 1712.00886
Bochkovskiy A, Wang CY, Liao H (2020) Yolov4: optimal speed and accuracy of object detection
Jocher G, et al (2021) yolov5. https://github.com/ultralytics/yolov5
Larochelle H, Hinton G (2010) Learning to combine foveal glimpses with a third-order boltzmann machine. In: Proceedings of the 23rd international conference on neural information processing systems, volume 1, NIPS’10, pp 1243–1251, Red Hook, NY, USA, Curran Associates Inc
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN , Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008
Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3156–3164
Cao Y, Xu J, Lin S, Wei F, Hu H (2019) Gcnet: non-local networks meet squeeze-excitation networks and beyond. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV) workshops
Cai Z, Fan Q, Feris RS, Vasconcelos N (2016) A unified multi-scale deep convolutional neural network for fast object detection. In: Bastian L, Jiri M, Nicu S, Max W (eds) Computer vision: ECCV 2016. Springer, Cham, pp 354–370
Wang H, Wang Y, Zhang Z, Fu X, Wang M (2019)Kernelized multiview subspace analysis by self-weighted learning
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China Grant 62176037 and 62002041, by Liaoning Revitalization Talents Program XLYC1908007, by the Dalian Science and Technology Innovation Fund 2021JJ12GX028, by the Liaoning Doctoral Research Start-up Fund Project Grant 2021-BS-075.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Xu, F., Wang, H., Sun, X. et al. Refined marine object detector with attention-based spatial pyramid pooling networks and bidirectional feature fusion strategy. Neural Comput & Applic 34, 14881–14894 (2022). https://doi.org/10.1007/s00521-022-07264-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-07264-8