Skip to main content
Log in

A deep feature matching pipeline with triple search strategy

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Local feature matching between images is a challenging task, and current research focuses on pursuing higher accuracy matching results at the cost of higher time consumption and resource consumption, e.g., using multilayer search strategies to obtain higher matching accuracy. On the other hand, low-time consumption methods perform poorly in matching accuracy, such as using a coarse-to-fine strategy due to the loss of information of many feature maps resulting in lower matching accuracy. To address the above problems, we propose a matching pipeline that balances matching accuracy and time consumption. This pipeline uses a triple search strategy to search the information on three feature maps for local feature matching, which can obtain both higher matching accuracy than the coarse-to-fine method and lower computational complexity than the hierarchical strategy method, thus achieving a balance between accuracy and time consumption. In our pipeline, a pre-trained network is used as the backbone to generate feature maps from different layers. In addition, we collect the coarse matches and geometric transformations of the coarse feature maps. Then, local feature maps centered on matching points are cropped from the middle feature maps for refinement matching. After this step, the positioning of the refined middle matches on the fine layer feature map can be obtained with high accuracy. Extensive experiments are conducted on the Hpatches, IMC2020, and Aachen Day–Night datasets to demonstrate the effectiveness of the proposed pipeline, which is competitive with the current state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Availability of data and materials

Research data are not shared.

References

  1. Mur-Artal R, Montiel JMM, Tardos JD (2015) Orb-slam: a versatile and accurate monocular slam system. IEEE Trans Rob 5:1147–1163

    Article  Google Scholar 

  2. Forster C, Pizzoli M, Scaramuzza D (2014) SVO: fast semi-direct monocular visual odometry. In: IEEE International Conference on Robotics and Automation (ICRA), pp 15–22

  3. Engel J, Koltun V, Cremers D (2017) Direct sparse odometry. IEEE Trans Pattern Anal Mach Intell 40(3):611–625

    Article  Google Scholar 

  4. Heinly J, Schonberger JL, Dunn E, Frahm J-M (2015) Reconstructing the world* in six days*(as captured by the yahoo 100 million image dataset). In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3287–3295

  5. Schönberger JL, Pollefeys M, Geiger A, Sattler T (2018) Semantic visual localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6896–6906

  6. Schonberger JL, Frahm J-M (2016) Structure-from-motion revisited. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4104–4113

  7. Taira H, Okutomi M, Sattler T, Cimpoi M, Pollefeys M, Sivic J, Pajdla T, Torii A (2018) Inloc: indoor visual localization with dense matching and view synthesis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7199–7209

  8. Sattler T, Maddern W, Toft C, Torii A, Hammarstrand L, Stenborg E, Safari D, Okutomi M, Pollefeys M, Sivic J et al (2018) Benchmarking 6dof outdoor visual localization in changing conditions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 8601–8610

  9. Yang M, He D, Fan M, Shi B, Xue X, Li F, Ding E, Huang J (2021) Dolg: single-stage image retrieval with deep orthogonal fusion of local and global features. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 11772–11781

  10. Alsmadi MK (2020) Content-based image retrieval using color, shape and texture descriptors and features. Arab J Sci Eng 45(4):3317–3330

    Article  Google Scholar 

  11. Verdie Y, Yi K, Fua P, Lepetit V (2015) Tilde: a temporally invariant learned detector. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5279–5288

  12. Barroso-Laguna A, Riba E, Ponsa D, Mikolajczyk K (2019) Key. net: Keypoint detection by handcrafted and learned CNN filters. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 5836–5844

  13. Simo-Serra E, Trulls E, Ferraz L, Kokkinos I, Fua P, Moreno-Noguer F (2015) Discriminative learning of deep convolutional feature point descriptors. In: Proceedings of the IEEE International Conference on Computer Vision, pp 118–126

  14. Mishchuk A, Mishkin D, Radenovic F, Matas J (2017) Working hard to know your neighbor’s margins: local descriptor learning loss. In: Advances in Neural Information Processing Systems, vol 30

  15. Tian Y, Yu X, Fan B, Wu F, Heijnen H, Balntas V (2019) Sosnet: aecond order similarity regularization for local descriptor learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 11016–11025

  16. Ebel P, Mishchuk A, Yi KM, Fua P, Trulls E (2019) Beyond cartesian representations for local descriptors. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 253–262

  17. Luo Z, Shen T, Zhou L, Zhang J, Yao Y, Li S, Fang T, Quan L (2019) Contextdesc: local descriptor augmentation with cross-modality context. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 2527–2536

  18. Luo Z, Shen T, Zhou L, Zhu S, Zhang R, Yao Y, Fang T, Quan L (2018) Geodesc: Learning local descriptors by integrating geometry constraints. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 168–183

  19. DeTone D, Malisiewicz T, Rabinovich A (2018) Superpoint: Self-supervised interest point detection and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp 224–236

  20. Dusmanu M, Rocco I, Pajdla T, Pollefeys M, Sivic J, Torii A, Sattler T (2019) D2-net: a trainable CNN for joint description and detection of local features. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 8092–8101

  21. Revaud J, De Souza C, Humenberger M, Weinzaepfel P (2019) R2d2: Reliable and repeatable detector and descriptor. In: Advances in Neural Information Processing Systems, vol 32

  22. Luo Z, Zhou L, Bai X, Chen H, Zhang J, Yao Y, Li S, Fang T, Quan L (2020) Aslfeat: learning local features of accurate shape and localization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern recognition, pp 6589–6598

  23. Bhowmik A, Gumhold S, Rother C, Brachmann E (2020) Reinforced feature points: optimizing feature detection and description for a high-level task. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4948–4957

  24. Tyszkiewicz M, Fua P, Trulls E (2020) Disk: learning local features with policy gradient. Adv Neural Inf Process Syst 33:14254–14265

    Google Scholar 

  25. Li K, Wang L, Liu L, Ran Q, Xu K, Guo Y (2022) Decoupling makes weakly supervised local feature better. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 15838–15848

  26. Sarlin P-E, DeTone D, Malisiewicz T, Rabinovich A (2020) Superglue: Learning feature matching with graph neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4938–4947

  27. Chen H, Luo Z, Zhang J, Zhou L, Bai X, Hu Z, Tai C-L, Quan L (2021) Learning to match features with seeded graph matching network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 6301–6310

  28. Shi Y, Cai J-X, Shavit Y, Mu T-J, Feng W, Zhang K (2022) Clustergnn: cluster-based coarse-to-fine graph neural network for efficient feature matching. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 12517–12526

  29. Viniavskyi O, Dobko M, Mishkin D, Dobosevych O (2022) Openglue: open source graph neural net based pipeline for image matching. arXiv preprint arXiv:2204.08870

  30. Yi KM, Trulls E, Ono Y, Lepetit V, Salzmann M, Fua P (2018) Learning to find good correspondences. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2666–2674

  31. Zhang J, Sun D, Luo Z, Yao A, Zhou L, Shen T, Chen Y, Quan L, Liao H (2019) Learning two-view correspondences and geometry using order-aware network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 5845–5854

  32. Sun W, Jiang W, Trulls E, Tagliasacchi A, Yi KM (2020) Acne: Attentive context normalization for robust permutation-equivariant learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 11286–11295

  33. Zhou Q, Sattler T, Leal-Taixe L (2021) Patch2pix: epipolar-guided pixel-level correspondences. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4669–4678

  34. Sun J, Shen Z, Wang Y, Bao H, Zhou X (2021) Loftr: detector-free local feature matching with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 8922–8931

  35. Edstedt J, Wadenbäck M, Felsberg M (2022) Deep kernelized dense geometric matching. arXiv preprint arXiv:2202.00667

  36. Efe U, Ince KG, Alatan A (2021) DFM: a performance baseline for deep feature matching. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4284–4293

  37. Wang Q, Zhang J, Yang K, Peng K, Stiefelhagen R (2022) Matchformer: interleaving attention in transformers for feature matching. arXiv preprint arXiv:2203.09645

  38. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  39. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778

  40. Wang Q, Zhou X, Hariharan B, Snavely N (2020) Learning feature descriptors using camera pose supervision. In: European Conference on Computer Vision, pp 757–774

  41. Balntas V, Lenc K, Vedaldi A, Mikolajczyk K (2017) Hpatches: a benchmark and evaluation of handcrafted and learned local descriptors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5173–5182

  42. Jin Y, Mishkin D, Mishchuk A, Matas J, Fua P, Yi KM, Trulls E (2021) Image matching across wide baselines: from paper to practice. Int J Comput Vision 129(2):517–547

    Article  Google Scholar 

  43. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60:91–110

    Article  Google Scholar 

  44. Tian Y, Fan B, Wu F (2017) L2-net: deep learning of discriminative patch descriptor in euclidean space. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 661–669

  45. Ma J, Jiang X, Jiang J, Zhao J, Guo X (2019) LMR: learning a two-class classifier for mismatch removal. IEEE Trans Image Process 28(8):4045–4059

    Article  MathSciNet  MATH  Google Scholar 

  46. Zhao X, Liu J, Wu X, Chen W, Guo F, Li Z (2021) Probabilistic spatial distribution prior based attentional keypoints matching network. IEEE Trans Circuits Syst Video Technol 32(3):1313–1327

    Article  Google Scholar 

  47. Fischler MA, Bolles RC (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM 24(6):381–395

    Article  MathSciNet  Google Scholar 

  48. Torr PH, Nasuto SJ, Bishop JM (2002) Napsac: high noise, high dimensional robust estimation—it’s in the bag. In: British Machine Vision Conference (BMVC) vol 2, 3

  49. Ni K, Jin H, Dellaert F (2009) Groupsac: efficient consensus in the presence of groupings. In: 2009 IEEE 12th International Conference on Computer Vision, pp 2193–2200

  50. Chum O, Matas J (2005) (2005) Matching with prosac-progressive sample consensus. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05) vol 1, pp 220–226

  51. Chum O, Matas J, Kittler J (2003) Locally optimized ransac. IN: Joint Pattern Recognition Symposium, pp 236–243

  52. Ma J, Zhao J, Tian J, Yuille AL, Tu Z (2014) Robust point matching via vector field consensus. IEEE Trans Image Process 23(4):1706–1721

    Article  MathSciNet  MATH  Google Scholar 

  53. Ma J, Zhou H, Zhao J, Gao Y, Jiang J, Tian J (2015) Robust feature matching for remote sensing image registration via locally linear transforming. IEEE Trans Geosci Remote Sens 53(12):6469–6481

    Article  Google Scholar 

  54. Ma J, Jiang J, Liu C, Li Y (2017) Feature guided gaussian mixture model with semi-supervised EM and local geometric constraint for retinal image registration. Inf Sci 417:128–142

    Article  MathSciNet  MATH  Google Scholar 

  55. Ma J, Wu J, Zhao J, Jiang J, Zhou H, Sheng QZ (2018) Nonrigid point set registration with robust transformation learning under manifold regularization. IEEE Trans Neural Netw Learn Syst 30(12):3584–3597

    Article  MathSciNet  Google Scholar 

  56. Rocco I, Cimpoi M, Arandjelović R, Torii A, Pajdla T, Sivic J (2018) Neighbourhood consensus networks. In: Advances in Neural Information Processing Systems, vol 31

  57. Rocco I, Arandjelović R, Sivic J (2020) Efficient neighbourhood consensus networks via submanifold sparse convolutions. In: European Conference on Computer Vision, pp 605–621

  58. Li X, Han K, Li S, Prisacariu V (2020) Dual-resolution correspondence networks. Adv Neural Inf Process Syst 33:17346–17357

    Google Scholar 

  59. Bökman G, Kahl F (2022) A case for using rotation invariant features in state of the art feature matchers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5110–5119

  60. Tang S, Zhang J, Zhu S, Tan P (2022) Quadtree attention for vision transformers. arXiv preprint arXiv:2201.02767

  61. Chen H, Luo Z, Zhou L, Tian Y, Zhen M, Fang T, McKinnon D, Tsin Y, Quan L (2022)Aspanformer: Detector-free image matching with adaptive span transformer. In: European Conference on Computer Vision, pp 20–36

  62. Xie T, Dai K, Wang K, Li R, Zhao L (2023) Deepmatcher: a deep transformer-based network for robust and accurate local feature matching. arXiv preprint arXiv:2301.02993

  63. Giang KT, Song S, Jo S (2022) Topicfm: robust and interpretable feature matching with topic-assisted. arXiv preprint arXiv:2207.00328

  64. Li Z, Snavely N (2018) Megadepth: learning single-view depth prediction from internet photos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2041–2050

  65. Zhao X, Wu X, Miao J, Chen W, Chen PC, Li Z (2022) Alike: accurate and lightweight keypoint detection and descriptor extraction. IEEE Trans Multimedia

  66. Jiang W, Trulls E, Hosang J, Tagliasacchi A, Yi KM (2021) Cotr: correspondence transformer for matching across images. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 6207–6217

  67. Chum O, Werner T, Matas J (2005) Two-view geometry estimation unaffected by a dominant plane. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol 1, pp 772–779

  68. Sarlin P-E, Cadena C, Siegwart R, Dymczyk M (2019) From coarse to fine: robust hierarchical localization at large scale. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 12716–12725

  69. Zhang Z, Sattler T, Scaramuzza D (2021) Reference pose generation for long-term visual localization via learned features and view synthesis. Int J Comput Vision 129(4):821–844

    Article  Google Scholar 

Download references

Funding

This work was supported by Key Area Research and Development Program of Guangdong Province under Grant (Funding No.: 2020B0909020001) and National Natural Science Foundation of China (Funding No.: 61573113).

Author information

Authors and Affiliations

Authors

Contributions

SF wrote the main manuscript text. HW and HQ modify syntax. All authors reviewed the manuscript.

Corresponding author

Correspondence to Huaming Qian.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethics approval

This declaration is not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Feng, S., Qian, H. & Wang, H. A deep feature matching pipeline with triple search strategy. J Supercomput 79, 20878–20898 (2023). https://doi.org/10.1007/s11227-023-05418-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-023-05418-6

Keywords

Navigation