Skip to main content

Matching-to-Detecting: Establishing Dense and Reliable Correspondences Between Images

  • Conference paper
  • First Online:
Pattern Recognition and Computer Vision (PRCV 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14426))

Included in the following conference series:

  • 493 Accesses

Abstract

We present a novel view for local image feature matching, which is inspired by the difference between existing methods. Detector-based methods detect predefined keypoints in local regions, so that the stability and reliability of established matches are ensured. In contrast, detector-free methods usually directly match dense features and refine the filtered results, which can help generate more matches. In order to combine their advantages, we propose a novel Matching-to-Detecting (M2D) process for feature matching, in which we first perform global reasoning for patch-level matching and subsequently identify discriminative matches within local areas to obtain pixel-level matches. At the patch-level, dense matching provides our pipeline with the ability to find plenty of matches even in low-texture areas, while at the pixel-level, our method can be viewed as detecting from a matching perspective, so that the established matches have higher stability and reliability and are remarkable in local regions. Experimental results demonstrate that our method outperforms state-of-the-art methods by a significant margin in terms of matching accuracy and the number of matches. Moreover, the computational complexity of our model is quite low, making it more suitable for real-world applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Balntas, V., Lenc, K., Vedaldi, A., Mikolajczyk, K.: HPatches: a benchmark and evaluation of handcrafted and learned local descriptors. In: CVPR (2017)

    Google Scholar 

  2. Bay, H., Tuytelaars, T., Van Gool, L.: SURF: speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006). https://doi.org/10.1007/11744023_32

    Chapter  Google Scholar 

  3. Chen, H., et al.: Learning to match features with seeded graph matching network. In: Proceedings of International Conference on Computer Vision (ICCV) (2021)

    Google Scholar 

  4. Chen, H., et al.: ASpanFormer: detector-free image matching with adaptive span transformer. In: ECCV (2022)

    Google Scholar 

  5. DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperPoint: self-supervised interest point detection and description. In: CVPR (2018)

    Google Scholar 

  6. Dusmanu, M., et al.: D2-Net: a trainable CNN for joint description and detection of local features. In: CVPR (2019)

    Google Scholar 

  7. Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24, 381–395 (1981)

    Article  MathSciNet  Google Scholar 

  8. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)

    Google Scholar 

  9. Heinly, J., Schonberger, J.L., Dunn, E., Frahm, J.M.: Reconstructing the world* in six days *(as captured by the yahoo 100 million image dataset). In: CVPR (2015)

    Google Scholar 

  10. Katharopoulos, A., Vyas, A., Pappas, N., Fleuret, F.: Transformers are RNNs: fast autoregressive transformers with linear attention. In: ICML (2020)

    Google Scholar 

  11. Kundu, J.N., Rahul, M.V., Ganeshan, A., Babu, R.V.: Object pose estimation from monocular image using multi-view keypoint correspondence. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11131, pp. 298–313. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11015-4_23

    Chapter  Google Scholar 

  12. Li, X., Han, K., Li, S., Prisacariu, V.: Dual-resolution correspondence networks. In: NeurIPS (2020)

    Google Scholar 

  13. Li, Z., Snavely, N.: MegaDepth: learning single-view depth prediction from internet photos. In: CVPR (2018)

    Google Scholar 

  14. Lin, T.Y., DollĂ¡r, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: CVPR (2017)

    Google Scholar 

  15. Liu, C., Yuen, J., Torralba, A.: SIFT flow: dense correspondence across scenes and its applications. IEEE Trans. Pattern Anal. Mach. Intell. 33(5), 978–994 (2010)

    Article  Google Scholar 

  16. Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)

  17. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60, 91–110 (2004)

    Article  Google Scholar 

  18. Revaud, J., et al.: R2D2: repeatable and reliable detector and descriptor. arXiv preprint arXiv:1906.06195 (2019)

  19. Rocco, I., Arandjelović, R., Sivic, J.: Efficient neighbourhood consensus networks via submanifold sparse convolutions. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12354, pp. 605–621. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58545-7_35

    Chapter  Google Scholar 

  20. Rocco, I., Cimpoi, M., Arandjelović, R., Torii, A., Pajdla, T., Sivic, J.: Neighbourhood consensus networks. In: NeurIPS (2018)

    Google Scholar 

  21. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28

    Chapter  Google Scholar 

  22. Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: an efficient alternative to SIFT or SURF. In: ICCV (2011)

    Google Scholar 

  23. Sarlin, P.E., DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperGlue: learning feature matching with graph neural networks. In: CVPR (2020)

    Google Scholar 

  24. Schonberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: CVPR (2016)

    Google Scholar 

  25. Shi, Y., Cai, J.X., Shavit, Y., Mu, T.J., Feng, W., Zhang, K.: ClusterGNN: cluster-based coarse-to-fine graph neural network for efficient feature matching. In: CVPR (2022)

    Google Scholar 

  26. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  27. Sun, J., Shen, Z., Wang, Y., Bao, H., Zhou, X.: LoFTR: detector-free local feature matching with transformers. In: CVPR (2021)

    Google Scholar 

  28. Svärm, L., Enqvist, O., Kahl, F., Oskarsson, M.: City-scale localization for cameras with known vertical direction. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1455–1461 (2016)

    Article  Google Scholar 

  29. Tyszkiewicz, M., Fua, P., Trulls, E.: DISK: learning local features with policy gradient. In: NeurIPS (2020)

    Google Scholar 

  30. Wang, Q., Zhou, X., Hariharan, B., Snavely, N.: Learning feature descriptors using camera pose supervision. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 757–774. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_44

    Chapter  Google Scholar 

  31. Wang, Q., Zhang, J., Yang, K., Peng, K., Stiefelhagen, R.: MatchFormer: interleaving attention in transformers for feature matching. arXiv preprint arXiv:2203.09645 (2022)

  32. Zhang, R., Zhu, S., Fang, T., Quan, L.: Distributed very large scale bundle adjustment by global camera consensus. In: ICCV (2017)

    Google Scholar 

  33. Zhou, Q., Sattler, T., Leal-Taixe, L.: Patch2Pix: epipolar-guided pixel-level correspondences. In: CVPR (2021)

    Google Scholar 

Download references

Acknowledgements

This research was partly supported by grants of National Natural Science Foundation of China (NSFC, Grant No. 62171281), Science and Technology Commission of Shanghai Municipality (STCSM, Grant Nos. 20DZ1200203, 2021SHZDZX0102, 22DZ2229005).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hua Yang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Xu, H., Zhou, J., Pan, R., Yang, H., Li, C., Zhao, X. (2024). Matching-to-Detecting: Establishing Dense and Reliable Correspondences Between Images. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14426. Springer, Singapore. https://doi.org/10.1007/978-981-99-8432-9_14

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-8432-9_14

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-8431-2

  • Online ISBN: 978-981-99-8432-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics