Skip to main content
Log in

Siamese transformer network-based similarity metric learning for cross-source remote sensing image retrieval

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

As a fundamental technique for mining and analysis of remote sensing (RS) big data, content-based remote sensing image retrieval (CBRSIR) has received a lot of attention. Recently, cross-source CBRSIR (CS-CBRSIR) has become one of the most challenging tasks in the RS community. Due to the data drift issue, it is hard to find a proper similarity metric function to accurately measure similarities between the RS images from different sources. To address this issue, instead of directly using the manually designed similarity metrics, we propose an end-to-end similarity metric learning network, i.e., Siamese Transformer Network (STN) for CS-CBRSIR. Specifically, the proposed STN consists of three modules: (1) feature extraction module, which is a network combining Vision Transformer (ViT) with convolution layers, named as ConViT, (2) similarity metric function, which is a fully connected neural network (FCNN) aiming to compute the similarity between the output features from different sources, and (3) smooth average-precision (Smooth-AP) loss function, which measures the surrogate loss of standard AP metric to optimize the similarity metric function through backpropagation. Afterward, the learned similarity metric function can be adopted to implement the CS-CBRSIR accurately. Extensive experiments and ablation studies demonstrate that the proposed approach achieves promising performance in the CS-CBRSIR task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. https://github.com/Andrew-Brown1/Smooth_AP.

  2. ViT-Pytorch toolkit is available at https://github.com/lucidrains/vit-pytorch.

  3. PatternNet is available at https://sites.google.com/view/zhouwx/dataset.

  4. FAIR1M is available at http://gaofen-challenge.com/indexpage.

  5. DSRSID dataset is available from Baidu Cloud Storage at https://pan.baidu.com/s/15ZWaZ2yArnvwcwtead_rpQ.

References

  1. Ma J, Ma Y, Li C (2019) Infrared and visible image fusion methods and applications: a survey. Inf Fus 45:153–178

    Google Scholar 

  2. Yang C, Luo X, Lu J, Liu F (2018) Extracting hidden messages of mlsb steganography based on optimal stego subset. Sci China Inf Sci 61(11):1–3

    Google Scholar 

  3. Xu G, Wang Y-L, Gong Y (2019) The novel method with sequence sar imagery for ins/sar integrated navigation system, 1–4 . IEEE

  4. Li Y, Ma J, Zhang Y (2021) Image retrieval from remote sensing big data: a survey. Inf Fus 67:94–115

    Google Scholar 

  5. Kumar M, Sarim M, Nemati A (2020) Autonomous navigation and target geo-location in gps denied environment, 153–175

  6. Yu M, Yang C, Li Y (2018) Big data in natural disaster management: a review. Geosciences 8(5):165

    Google Scholar 

  7. Staniczenko PP, Sivasubramaniam P, Suttle KB, Pearson RG (2017) Linking macroecology and community ecology: refining predictions of species distributions using biotic interaction networks. Ecol Lett 20(6):693–707

    Google Scholar 

  8. Gómez Vargas, N (2020) Ensemble methods in supervised learning: review towards an application in a model for predictions about ecology

  9. Li P, Ren P, Zhang X, Wang Q, Zhu X, Wang L (2018) Region-wise deep feature representation for remote sensing images. Remote Sens 10(6):871

    Google Scholar 

  10. Li Y, Zhang Y, Huang X, Zhu H, Ma J (2017) Large-scale remote sensing image retrieval by deep hashing neural networks. IEEE Trans Geosci Remote Sens 56(2):950–965

    Google Scholar 

  11. Xiong W, Lv Y, Cui Y, Zhang X, Gu X (2019) A discriminative feature learning approach for remote sensing image retrieval. Remote Sens 11(3):281

    Google Scholar 

  12. Imbriaco R, Sebastian C, Bondarev E, de With PH (2019) Aggregated deep local features for remote sensing image retrieval. Remote Sens 11(5):493

    Google Scholar 

  13. Zhou, Z., Gaurav, A., Gupta, B.B., Lytras, M.D., Razzak, I (2021) A fine-grained access control and security approach for intelligent vehicular transport in 6g communication system. IEEE Trans Intell Transp Syst

  14. Hou R, Ai S, Chen Q, Yan H, Huang T, Chen K (2022) Similarity-based integrity protection for deep learning systems. Inf Sci 601:255–267

    Google Scholar 

  15. Othman E, Bazi Y, Melgani F, Alhichri H, Alajlan N, Zuair M (2017) Domain adaptation network for cross-scene classification. IEEE Trans Geosci Remote Sens 55(8):4441–4456

    Google Scholar 

  16. Zhou W, Newsam S, Li C, Shao Z (2017) Learning low dimensional convolutional neural networks for high-resolution remote sensing image retrieval. Remote Sens 9(5):489

    Google Scholar 

  17. Ge Y, Jiang S, Xu Q, Jiang C, Ye F (2018) Exploiting representations from pre-trained convolutional neural networks for high-resolution remote sensing image retrieval. Multimed Tools Appl 77(13):17489–17515

    Google Scholar 

  18. Cao R, Zhang Q, Zhu J, Li Q, Li Q, Liu B, Qiu G (2020) Enhancing remote sensing image retrieval using a triplet deep metric learning network. Int J Remote Sens 41(2):740–751

    Google Scholar 

  19. Gupta, S., Hoffman, J., Malik, J (2016) Cross modal distillation for supervision transfer, 2827–2836

  20. Zhou Z, Li Y, Zhang Y, Yin Z, Qi L, Ma R (2021) Residual visualization-guided explainable copy-relationship learning for image copy detection in social networks. Knowl-Based Syst 228:107287

    Google Scholar 

  21. Hong D, Gao L, Yokoya N, Yao J, Chanussot J, Du Q, Zhang B (2020) More diverse means better: multimodal deep learning meets remote-sensing imagery classification. IEEE Trans Geosci Remote Sens 59(5):4340–4354

    Google Scholar 

  22. Zhao X, Tao R, Li W, Li H-C, Du Q, Liao W, Philips W (2020) Joint classification of hyperspectral and lidar data using hierarchical random walk and deep cnn architecture. IEEE Trans Geosci Remote Sens 58(10):7355–7370

    Google Scholar 

  23. Zhou, Z., Dong, X., Li, Z., Yu, K., Ding, C., Yang, Y.: Spatio-temporal feature encoding for traffic accident detection in vanet environment. IEEE Trans Intell Transp Syst (2022)

  24. Hong D, Gao L, Yao J, Zhang B, Plaza A, Chanussot J (2020) Graph convolutional networks for hyperspectral image classification. IEEE Trans Geosci Remote Sens 59(7):5966–5978

    Google Scholar 

  25. Jiang, N., Jie, W., Li, J., Liu, X., Jin, D.: Gatrust: A multi-aspect graph attention network model for trust assessment in osns. IEEE Trans Knowl Data Eng (2022)

  26. Yuan K, Guo S, Liu Z, Zhou A, Yu F, Wu W (2021) Incorporating convolution designs into visual transformers, pp 579–588

  27. Azarang A, Kehtarnavaz N (2020) Image fusion in remote sensing by multi-objective deep learning. Int J Remote Sens 41(24):9507–9524

    Google Scholar 

  28. Cheng G, Xie X, Han J, Guo L, Xia G-S (2020) Remote sensing image scene classification meets deep learning: challenges, methods, benchmarks, and opportunities. IEEE J Sel Top Appl Earth Obs Remote Sens 13:3735–3756

    Google Scholar 

  29. Fu K, Chang Z, Zhang Y, Xu G, Zhang K, Sun X (2020) Rotation-aware and multi-scale convolutional neural network for object detection in remote sensing images. ISPRS J Photogramm Remote Sens 161:294–308

    Google Scholar 

  30. Wu M, Jin X, Jiang Q, Lee S-J, Liang W, Lin G, Yao S (2021) Remote sensing image colorization using symmetrical multi-scale dcgan in yuv color space. Vis Comput 37(7):1707–1729

    Google Scholar 

  31. Aptoula E (2013) Remote sensing image retrieval with global morphological texture descriptors. IEEE Trans Geosci Remote Sens 52(5):3023–3034

    Google Scholar 

  32. Zhou W, Newsam S, Li C, Shao Z (2018) Patternnet: A benchmark dataset for performance evaluation of remote sensing image retrieval. ISPRS J Photogramm Remote Sens 145:197–209

    Google Scholar 

  33. Xiong W, Xiong Z, Cui Y, Lv Y (2020) A discriminative distillation network for cross-source remote sensing image retrieval. IEEE J Sel Top Appl Earth Obs Remote Sens 13:1234–1247

    Google Scholar 

  34. Xie, J., Fang, Y., Zhu, F., Wong, E (2015) Deepshape: Deep learned shape descriptor for 3d shape matching and retrieval, pp 1275–1283

  35. Scott GJ, Klaric MN, Davis CH, Shyu C-R (2010) Entropy-balanced bitmap tree for shape-based object retrieval from large-scale satellite imagery databases. IEEE Trans Geosci Remote Sens 49(5):1603–1616

    Google Scholar 

  36. Liang C, Miao M, Ma J, Yan H, Zhang Q, Li X (2022) Detection of global positioning system spoofing attack on unmanned aerial vehicle system. Concurr Comput Pract Exp 34(7):5925

    Google Scholar 

  37. Zhu X, Shao Z (2011) Using no-parameter statistic features for texture image retrieval. Sens Rev

  38. Lowe DG (1999) Object recognition from local scale-invariant features 2, 1150–1157 . IEEE

  39. Bay H, Ess A, Tuytelaars T, Van Gool L (2008) Speeded-up robust features (surf). Comput Vis Image Underst 110(3):346–359

    Google Scholar 

  40. Ke Y, Sukthankar R(2004) Pca-sift: a more distinctive representation for local image descriptors, vol 2,. IEEE

  41. Perronnin, F., Liu, Y., Sánchez, J., Poirier, H (2010) Large-scale image retrieval with compressed fisher vectors, pp 3384–3391. IEEE

  42. Sánchez J, Perronnin F, Mensink T, Verbeek J (2013) Image classification with the fisher vector: theory and practice. Int J Comput Vis 105(3):222–245

    MathSciNet  MATH  Google Scholar 

  43. Tombe R, Viriri S(2019) Local descriptors parameter characterization with fisher vectors for remote sensing images, pp 1–5 . IEEE

  44. Huang L, Chen C, Li W, Du Q (2016) Remote sensing image scene classification using multi-scale completed local binary patterns and fisher vectors. Remote Sens 8(6):483

    Google Scholar 

  45. Jégou, H., Douze, M., Schmid, C., Pérez, P (2010) Aggregating local descriptors into a compact image representation, pp 3304–3311 . IEEE

  46. Yim J, Joo D, Bae J, Kim J (2017) A gift from knowledge distillation: fast optimization, network minimization and transfer learning, pp 4133–4141

  47. Krizhevsky, A., Sutskever, I., Hinton, G.E(2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25

  48. Shin H-C, Roth HR, Gao M, Lu L, Xu Z, Nogues I, Yao J, Mollura D, Summers RM (2016) Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans Med Imaging 35(5):1285–1298

    Google Scholar 

  49. Han D, Liu Q, Fan W (2018) A new image classification method using CNN transfer learning and web data augmentation. Expert Syst Appl 95:43–56

    Google Scholar 

  50. Hussain M, Bird JJ, Faria DR (2018) A study on cnn transfer learning for image classification, 191–202 . Springer

  51. Zhou W, Deng X, Shao Z (2018) Region convolutional features for multi-label remote sensing image retrieval. arXiv preprint arXiv:1807.08634

  52. Li P, Han L, Tao X, Zhang X, Grecos C, Plaza A, Ren P (2020) Hashing nets for hashing: A quantized deep learning to hash framework for remote sensing image retrieval. IEEE Trans Geosci Remote Sens 58(10):7331–7345

    Google Scholar 

  53. Chen Y, Lu X (2019) A deep hashing technique for remote sensing image-sound retrieval. Remote Sens 12(1):84

    Google Scholar 

  54. Liu C, Ma J, Tang X, Liu F, Zhang X, Jiao L (2020) Deep hash learning for remote sensing image retrieval. IEEE Trans Geosci Remote Sens 59(4):3420–3443

    Google Scholar 

  55. Zhou Z, Li Y, Li J, Yu K, Kou G, Wang M, Gupta BB (2022) Gan-siamese network for cross-domain vehicle re-identification in intelligent transport systems. IEEE Trans Netw Sci Eng

  56. Cohen, D., Mitra, B., Hofmann, K., Croft, W.B (2018) Cross domain regularization for neural ranking models using adversarial learning, pp 1025–1028

  57. Wang H, Shen T, Zhang W, Duan L-Y, Mei T (2020) Classes matter: a fine-grained adversarial approach to cross-domain semantic segmentation. Springer, Berlin, pp 642–659

    Google Scholar 

  58. Benjdira B, Bazi Y, Koubaa A, Ouni K (2019) Unsupervised domain adaptation using generative adversarial networks for semantic segmentation of aerial images. Remote Sens 11(11):1369

    Google Scholar 

  59. Zhu J-Y, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks, 2223–2232

  60. Xiong W, Lv Y, Zhang X, Cui Y (2020) Learning to translate for cross-source remote sensing image retrieval. IEEE Trans Geosci Remote Sens 58(7):4860–4874

    Google Scholar 

  61. Li Y, Zhang Y, Huang X, Ma J (2018) Learning source-invariant deep hashing convolutional neural networks for cross-source remote sensing image retrieval. IEEE Trans Geosci Remote Sens 56(11):6521–6536

    Google Scholar 

  62. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30:6000–6010

    Google Scholar 

  63. Hong D, Yokoya N, Chanussot J, Zhu XX (2018) An augmented linear mixing model to address spectral variability for hyperspectral unmixing. IEEE Trans Image Process 28(4):1923–1938

    MathSciNet  Google Scholar 

  64. Mohideen SK, Perumal SA, Sathik MM (2008) Image de-noising using discrete wavelet transform. Int J Comput Sci Netw Secur 8(1):213–216

    Google Scholar 

  65. Thakur RS, Chatterjee S, Yadav RN, Gupta L (2021) Image de-noising with machine learning: a review. IEEE Access 9:93338–93363

    Google Scholar 

  66. Schütze H, Manning CD, Raghavan P (2008) Introduction to information retrieval, vol 39. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  67. Schroff F, Kalenichenko D, Philbin J ( 2015) Facenet: A unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 815– 823

  68. Hoffer E, Ailon N ( 2015) Deep metric learning using triplet network. In: International workshop on similarity-based pattern recognition . Springer, pp 84–92

  69. Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L et al (2019) Pytorch: an imperative style, high-performance deep learning library. Adv Neural Inf Process Syst 32:12

    Google Scholar 

  70. Sun X, Wang P, Yan Z, Xu F, Wang R, Diao W, Chen J, Li J, Feng Y, Xu T et al (2022) Fair1m: a benchmark dataset for fine-grained object recognition in high-resolution remote sensing imagery. ISPRS J Photogramm Remote Sens 184:116–130

    Google Scholar 

  71. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  72. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition, pp 770–778

  73. He K, Lu Y, Sclaroff S ( 2018) Local descriptors optimized for average precision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 596– 605

  74. He K, Cakir F, Bargal SA, Sclaroff S (2017) Hashing as tie-aware learning to rank. Methods 5(23):46

    Google Scholar 

  75. Cakir F, He K, Xia X, Kulis B, Sclaroff S .( 2019) Deep metric learning to rank. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1861–1870

  76. Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, et al. \(\{\)TensorFlow\(\}\)( 2016) A system for \(\{\)Large-Scale\(\}\) machine learning. In: 12th USENIX symposium on operating systems design and implementation (OSDI 16), pp 265– 283

  77. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, et al (2020) An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929

  78. Touvron H, Cord M, Douze M, Massa F, Sablayrolles A, Jégou H( 2021) Training data-efficient image transformers and distillation through attention. In: International conference on machine learning, pp 10347– 10357 . PMLR

  79. Chollet, F ( 2017)Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1251–1258

  80. Liu W, Wen Y, Yu Z, Yang M ( 2016) Large-margin Softmax loss for convolutional neural networks. In: ICML, vol 2, p 7

  81. Wang X, Hua Y, Kodirov E, Hu G, Garnier R, Robertson NM(2019) Ranked list loss for deep metric learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5207– 5216 ( 2019)

  82. Koh PW, Liang P ( 2017). Understanding black-box predictions via influence functions. In: International conference on machine learning, pp 1885– 1894 PMLR

  83. Brown A, Xie W, Kalogeiton V, Zisserman A ( 2020) Smooth-ap: Smoothing the path towards large-scale image retrieval. In: European conference on computer vision, pp 677– 694 . Springer

  84. Gong Y, Lazebnik S, Gordo A, Perronnin F (2012) Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval. IEEE Trans Pattern Anal Mach Intell 35(12):2916–2929

    Google Scholar 

  85. Zhang D, Li W-J ( 2014) Large-scale supervised multimodal hashing with semantic correlation maximization. In: Proceedings of the AAAI conference on artificial intelligence, vol 28

  86. Mao G, Yuan Y, Xiaoqiang L ( 2018). Deep cross-modal retrieval for remote sensing image and audio. In: 2018 10th IAPR workshop on pattern recognition in remote sensing (PRRS), pp 1– 7 IEEE

  87. Wu A, Zheng W-S, Yu H-X, Gong S, Lai J ( 2017) Rgb-infrared cross-modality person re-identification. In: Proceedings of the IEEE international conference on computer vision, pp 5380– 5389

  88. Ye M, Lan X, Li J, Yuen P( 2018) Hierarchical discriminative learning for visible thermal person re-identification. In: Proceedings of the AAAI conference on artificial intelligence, vol 32

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhili Zhou.

Ethics declarations

Funding

This work is supported in part by the National Natural Science Foundation of China under Grant 61972205, Grant U1936218,Grant U20A20176, in part by the Guangdong Natural Science Funds for Distinguished Young Scholar, and in part by the Collaborative Innovation Center of Atmos-pheric Environment and Equipment Technology (CICAEET) fund, China.

Conflicts of interest

Author declares no conflicts of interest

Data availability statement

Data are available on request due to privacy or other restrictions

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ding, C., Wang, M., Zhou, Z. et al. Siamese transformer network-based similarity metric learning for cross-source remote sensing image retrieval. Neural Comput & Applic 35, 8125–8142 (2023). https://doi.org/10.1007/s00521-022-08092-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-022-08092-6

Keywords

Navigation