Skip to main content
Log in

Spatial hierarchy perception and hard samples metric learning for high-resolution remote sensing image object detection

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Due to the different shooting angles, altitudes and scenes, remote sensing images contain many complex backgrounds and multi-scale objects. Moreover, objects in remote sensing images are much smaller relative to the backgrounds, easily occluded by buildings and trees. These cause difficult feature extraction and increase the intra-class diversity of objects, making object detection on remote sensing images more challenging. In this paper, we propose a novel remote sensing image object detection method (SHDet) based on spatial hierarchy perception component (SHPC) and hard samples metric learning (HSML). We design a SHPC to extract the feature under the different spatial hierarchies and learn the contribution weights between feature channels to enhance the feature representation. HSML is proposed to narrow the feature differences of hard samples in the same category, reducing the error detection caused by intra-class diversity. Besides, we decouple the complex background to build the pre-training datasets for pre-training the object detection model, strengthening the object feature learning. The experiments carried out on two widely used remote sensing datasets (NWPU VHR-10 and DOTA-v1.5) show that the proposed method has better detection performance compared with several state-of-the-art object detection methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Tao H (2020) Detecting smoky vehicles from traffic surveillance videos based on dynamic features. Appl Intell 50(4):1057–1072

    Article  Google Scholar 

  2. Zhang G, Shijian L, Cad-net WZ (2019) A context-aware detection network for objects in remote sensing imagery. IEEE Trans Geosci Remote Sens 57(12):10015–10024

    Article  Google Scholar 

  3. Rashidian V, Baise LG, Koch M (2019) Detecting collapsed buildings after a natural hazard on vhr optical satellite imagery using u-net convolutional neural networks

  4. Liang X, Zhang J, Zhuo L, Li Y, Tian Q (2019) Small object detection in unmanned aerial vehicle images using feature fusion and scaling-based single shot detector with spatial context analysis. IEEE Transactions on Circuits and Systems for Video Technology

  5. Wu X, Hong D, Tian J, Chanussot Jx, Li W, Ran T (2019) Orsim detector: A novel object detection framework in optical remote sensing imagery using spatial-frequency channel features. IEEE Trans Geosci Remote Sens 57(7):5146–5158

    Article  Google Scholar 

  6. Bin J, Cong Y, Zhou W, Guoqing W (2014) A new method for detection of ship docked in harbor in high resolution remote sensing image. In: IEEE International conference on progress in informatics and computing, pp 341–344, IEEE

  7. Yokoya N, Iwasaki A (2015) Object detection based on sparse representation and hough voting for optical remote sensing imagery. IEEE J Select Topics Appl Earth Obser Remote Sens 8(5):2053–2062

    Article  Google Scholar 

  8. Ge L, Zhang Y, Zheng X, Sun X, Kun F, Wang H (2013) A new method on inshore ship detection in high-resolution satellite images using shape and context information. IEEE Geosci Remote Sens Lett 11(3):617–621

    Google Scholar 

  9. David GL (1999) Object recognition from local scale-invariant features. In: Proceedings of the seventh IEEE international conference on computer vision, vol 2. Ieee, pp 1150–1157

  10. Chen Z, Wang C, Wen C, Teng X, Chen Y, Guan H, Luo H, Cao L, Li J (2015) Vehicle detection in high-resolution aerial images via sparse representation and superpixels. IEEE Trans Geosci Remote Sens 54(1):103–116

    Article  Google Scholar 

  11. Qiu S, Wen G, Fan Y (2017) Occluded object detection in high-resolution remote sensing images using partial configuration object model. IEEE J Select Top Appl Earth Observ Remote Sen 10(5):1909–1925

    Article  Google Scholar 

  12. Felzenszwalb P, McAllester D, Ramanan D (2008) A discriminatively trained, multiscale, deformable part model. In: 2008 IEEE conference on computer vision and pattern recognition, pp 1–8, IEEE

  13. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

  14. Ding X, Li Q, Yongqiang C, Jinbao W, Weixin B, Biao J (2020) Local keypoint-based faster r-cnn. Applied Intelligence

  15. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems 91–99

  16. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788

  17. Wei L, Dragomir A, Dumitru E, Christian S, Scott R, Fu C-Y, Berg AC (2016). In: European conference on computer vision, pp21–37. Springer

  18. Tang T, Zhou S, Deng Z, Zou H, Lei L (2017) Vehicle detection in aerial images based on region convolutional neural networks and hard negative example mining. Sensors 17(2):336

    Article  Google Scholar 

  19. Wang G, Zhuang Y, Wang Z, Chen H, Shi H, Chen L (2019) Spatial enhanced-ssd for multiclass object detection in remote sensing images. In: IGARSS 2019-2019 IEEE international geoscience and remote sensing symposium, pp 318–321. IEEE

  20. Xie Y, Cai J, Bhojwani R, Shekhar S, Knight J (2020) A locally-constrained yolo framework for detecting small and densely-distributed building footprints. Int J Geogr Inf Sci 34(4):777– 801

    Article  Google Scholar 

  21. Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125

  22. Tian Z, Shen C, Chen H, Fcos TH (2019) Fully convolutional one-stage object detection. In: Proceedings of the IEEE international conference on computer vision, pp 9627–9636

  23. Lu L, Wu D, Wu T, Faliang H, Yaohua Y (2020) Anchor-free multi-orientation text detection in natural scene images. Appl Intell 50(11):3623–3637

    Article  Google Scholar 

  24. Cheng G, Han J, Zhou P, Guo L (2014) Multi-class geospatial object detection and geographic image classification based on collection of part detectors. ISPRS J Photogramm Remote Sens 98:119–132

    Article  Google Scholar 

  25. Xia GS, Bai X, Ding J, Zhu Z, Belongie S, Luo J, Datcu M, Pelillo M, Zhang L (2018) Dota: A large-scale dataset for object detection in aerial images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3974–3983

  26. Girshick Ross, Donahue Jeff, Darrell Trevor, Malik Jitendra (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587

  27. Dai J, Yi L, He K, Sun J (2016) R-fcn: Object detection via region-based fully convolutional networks. In: Advances in neural information processing systems, pp 379–387

  28. He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969

  29. Mao Q-C, Sun H-M, Zuo L-Q, Jia R-S (2020) Finding every car: a traffic surveillance multi-scale vehicle object detection method. Applied Intelligence

  30. Liu S, Di H, et al. (2018) Receptive field block net for accurate and fast object detection. In: Proceedings of the european conference on computer vision (ECCV), pp 385–400

  31. Fu C-Y, Liu W, Ranga A, Tyagi A, Berg AC (2017) Dssd: Deconvolutional single shot detector. arXiv:1701.06659

  32. Li K, Wan G, Cheng G, Meng L, Han J (2020) Object detection in optical remote sensing images: A survey and a new benchmark. ISPRS J Photogramm Remote Sens 159:296–307

    Article  Google Scholar 

  33. Xiongwei W, Sahoo D, Hoi SCH (2020) Recent advances in deep learning for object detection. Neurocomputing 396:39–64

    Article  Google Scholar 

  34. Li K, Gong C, Bu S, Xiong Y (2017) Rotation-insensitive and context-augmented object detection in remote sensing images. IEEE Trans Geosci Remote Sens 56(4):2337–2348

    Article  Google Scholar 

  35. Zhong Y, Han X, Zhang L (2018) Multi-class geospatial object detection based on a position-sensitive balancing framework for high spatial resolution remote sensing imagery. ISPRS J Photogramm Remote Sens 138:281–294

    Article  Google Scholar 

  36. Yang F, Fan H, Chu P, Blasch E, Ling H (2019) Clustered object detection in aerial images. In: Proceedings of the IEEE international conference on computer vision, pp 8311–8320

  37. Zheng Z, Zhong Y, Ma A, Han X, Ji Z, Liu Y, Zhang L (2020) Hynet: Hyper-scale object detection network framework for multiple spatial resolution remote sensing imagery. ISPRS J Photogramm Remote Sens 166:1–14

    Article  Google Scholar 

  38. Cheng G, Zhou P, Han J (2016) Learning rotation-invariant convolutional neural networks for object detection in vhr optical remote sensing images. IEEE Trans Geosci Remote Sens 54(12):7405–7415

    Article  Google Scholar 

  39. Cheng G, Si Y, Hong H, Yao X, Guo L (2020) Cross-scale feature fusion for object detection in optical remote sensing images. IEEE Geosci Remote Sens Lett 1–5

  40. Dong R, Xu D, Zhao J, Jiao L, An J (2019) Sig-nms-based faster r-cnn combining transfer learning for small target detection in vhr optical remote sensing imagery. IEEE Trans Geosci Remote Sens 57 (11):8534–8545

    Article  Google Scholar 

  41. Chen H, Zhang L, Ma J, Zhang J (2019) Target heat-map network: An end-to-end deep network for target detection in remote sensing images. Neurocomputing 331:375–387

    Article  Google Scholar 

  42. Tang T, Zhou S, Deng Z, Lei L, Zou H (2017) Arbitrary-oriented vehicle detection in aerial imagery with single convolutional neural networks. Remote Sens 9(11):1170

    Article  Google Scholar 

  43. Zhang W, Jiao L, Liu X, Liu J (2019) Multi-scale feature fusion network for object detection in vhr optical remote sensing images. In: IGARSS 2019-2019 IEEE international geoscience and remote sensing symposium, pp 330–333. IEEE

  44. Xie W, Qin H, Li Y, Wang Z, Lei J (2019) A novel effectively optimized one-stage network for object detection in remote sensing imagery. Remote Sens 11(11):1376

    Article  Google Scholar 

  45. Chen L-C, Papandreou G, Florian S, Adam H (2017) Rethinking atrous convolution for semantic image segmentation. arXiv:1706.05587

  46. Zheng Z, Zhong Y, Wang J, Ma A (2020) Foreground-aware relation network for geospatial object segmentation in high spatial resolution remote sensing imagery. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4096–4105

  47. Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988

  48. Hu J, Shen L, Gang S (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141

  49. Li X, Wang W, Xiaolin H, Yang J (2019) Selective kernel networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 510–519

  50. Rahman MA, Wang Y (2016) Optimizing intersection-over-union in deep neural networks for image segmentation. In: International symposium on visual computing, pp 234–244. Springer

  51. Shrivastava A, Gupta A, Girshick R (2016) Training region-based object detectors with online hard example mining. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 761–769

  52. Everingham M, Gool LV, Williams CKI, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338

    Article  Google Scholar 

  53. He Kaiming, Zhang Xiangyu, Ren Shaoqing, Sun Jian (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  54. Cai Z, Vasconcelos N (2019) Cascade r-cnn: High quality object detection and instance segmentation, IEEE Trans Pattern Anal Mach Intell 1–1

  55. Pang J, Chen K, Shi J, Feng H, Ouyang W, Dahua L (2019) Libra r-cnn: Towards balanced learning for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 821–830

  56. Guo C, Fan B, Zhang Q, Xiang S, Pan C (2020) Augfpn: Improving multi-scale feature learning for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12595–12604

  57. Guo J, Han K, Wang Y, Zhang C, Yang Z, Han W, Chen X, Chang X (2020) Hit-detector: Hierarchical trinity architecture search for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11405– 11414

  58. Kong T, Sun F, Liu H, Jiang Y, Li L, Jianbo Shi. (2020) Foveabox: Beyound anchor-based object detection. IEEE Trans Image Process 29:7389–7398

    Article  Google Scholar 

  59. Li K, Cheng G, Bu S, You X (2018) Rotation-insensitive and context-augmented object detection in remote sensing images. IEEE Trans Geosci Remote Sens 56(4):2337–2348

    Article  Google Scholar 

  60. Wu X, Hong D, Ghamisi P, Li W, Ran Tao (2018) Msri-ccf: Multi-scale and rotation-insensitive convolutional channel features for geospatial object detection. Remote Sens 10(12): 1990

    Article  Google Scholar 

  61. Wu Y, Zhang K, Wang J, Wang Y, Wang Q, Li Q (2020) Cdd-net: A context-driven detection network for multiclass object detection. IEEE Geoscience and Remote Sensing Letters

Download references

Acknowledgements

This work was supported by the State’s Key Project of Research and Development Plan of China (No. 2016YFC0600908), the National Natural Science Foundation of China (No. 61806206, 61772530),the Natural Science Foundation of Jiangsu Province (No. BK20180639, BK20201346, BK20171192), the Six Talent Peaks Project in Jiangsu Province (No. 2015-DZXX-010, 2018-XYDXX-044).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shixiong Xia.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, D., Xia, S., Zhao, J. et al. Spatial hierarchy perception and hard samples metric learning for high-resolution remote sensing image object detection. Appl Intell 52, 3193–3208 (2022). https://doi.org/10.1007/s10489-021-02335-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02335-0

Keywords

Navigation