Matching-to-Detecting: Establishing Dense and Reliable Correspondences Between Images

Xu, Haobo; Zhou, Jun; Pan, Renjie; Yang, Hua; Li, Cunyan; Zhao, Xiangyu

doi:10.1007/978-981-99-8432-9_14

Haobo Xu^15,16,
Jun Zhou^15,16,
Renjie Pan^15,16,
Hua Yang^15,16,
Cunyan Li^15,16 &
…
Xiangyu Zhao^15,16

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14426))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

493 Accesses

Abstract

We present a novel view for local image feature matching, which is inspired by the difference between existing methods. Detector-based methods detect predefined keypoints in local regions, so that the stability and reliability of established matches are ensured. In contrast, detector-free methods usually directly match dense features and refine the filtered results, which can help generate more matches. In order to combine their advantages, we propose a novel Matching-to-Detecting (M2D) process for feature matching, in which we first perform global reasoning for patch-level matching and subsequently identify discriminative matches within local areas to obtain pixel-level matches. At the patch-level, dense matching provides our pipeline with the ability to find plenty of matches even in low-texture areas, while at the pixel-level, our method can be viewed as detecting from a matching perspective, so that the established matches have higher stability and reliability and are remarkable in local regions. Experimental results demonstrate that our method outperforms state-of-the-art methods by a significant margin in terms of matching accuracy and the number of matches. Moreover, the computational complexity of our model is quite low, making it more suitable for real-world applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Balntas, V., Lenc, K., Vedaldi, A., Mikolajczyk, K.: HPatches: a benchmark and evaluation of handcrafted and learned local descriptors. In: CVPR (2017)
Google Scholar
Bay, H., Tuytelaars, T., Van Gool, L.: SURF: speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006). https://doi.org/10.1007/11744023_32
Chapter Google Scholar
Chen, H., et al.: Learning to match features with seeded graph matching network. In: Proceedings of International Conference on Computer Vision (ICCV) (2021)
Google Scholar
Chen, H., et al.: ASpanFormer: detector-free image matching with adaptive span transformer. In: ECCV (2022)
Google Scholar
DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperPoint: self-supervised interest point detection and description. In: CVPR (2018)
Google Scholar
Dusmanu, M., et al.: D2-Net: a trainable CNN for joint description and detection of local features. In: CVPR (2019)
Google Scholar
Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24, 381–395 (1981)
Article MathSciNet Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Google Scholar
Heinly, J., Schonberger, J.L., Dunn, E., Frahm, J.M.: Reconstructing the world* in six days *(as captured by the yahoo 100 million image dataset). In: CVPR (2015)
Google Scholar
Katharopoulos, A., Vyas, A., Pappas, N., Fleuret, F.: Transformers are RNNs: fast autoregressive transformers with linear attention. In: ICML (2020)
Google Scholar
Kundu, J.N., Rahul, M.V., Ganeshan, A., Babu, R.V.: Object pose estimation from monocular image using multi-view keypoint correspondence. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11131, pp. 298–313. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11015-4_23
Chapter Google Scholar
Li, X., Han, K., Li, S., Prisacariu, V.: Dual-resolution correspondence networks. In: NeurIPS (2020)
Google Scholar
Li, Z., Snavely, N.: MegaDepth: learning single-view depth prediction from internet photos. In: CVPR (2018)
Google Scholar
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: CVPR (2017)
Google Scholar
Liu, C., Yuen, J., Torralba, A.: SIFT flow: dense correspondence across scenes and its applications. IEEE Trans. Pattern Anal. Mach. Intell. 33(5), 978–994 (2010)
Article Google Scholar
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60, 91–110 (2004)
Article Google Scholar
Revaud, J., et al.: R2D2: repeatable and reliable detector and descriptor. arXiv preprint arXiv:1906.06195 (2019)
Rocco, I., Arandjelović, R., Sivic, J.: Efficient neighbourhood consensus networks via submanifold sparse convolutions. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12354, pp. 605–621. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58545-7_35
Chapter Google Scholar
Rocco, I., Cimpoi, M., Arandjelović, R., Torii, A., Pajdla, T., Sivic, J.: Neighbourhood consensus networks. In: NeurIPS (2018)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: an efficient alternative to SIFT or SURF. In: ICCV (2011)
Google Scholar
Sarlin, P.E., DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperGlue: learning feature matching with graph neural networks. In: CVPR (2020)
Google Scholar
Schonberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: CVPR (2016)
Google Scholar
Shi, Y., Cai, J.X., Shavit, Y., Mu, T.J., Feng, W., Zhang, K.: ClusterGNN: cluster-based coarse-to-fine graph neural network for efficient feature matching. In: CVPR (2022)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Sun, J., Shen, Z., Wang, Y., Bao, H., Zhou, X.: LoFTR: detector-free local feature matching with transformers. In: CVPR (2021)
Google Scholar
Svärm, L., Enqvist, O., Kahl, F., Oskarsson, M.: City-scale localization for cameras with known vertical direction. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1455–1461 (2016)
Article Google Scholar
Tyszkiewicz, M., Fua, P., Trulls, E.: DISK: learning local features with policy gradient. In: NeurIPS (2020)
Google Scholar
Wang, Q., Zhou, X., Hariharan, B., Snavely, N.: Learning feature descriptors using camera pose supervision. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 757–774. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_44
Chapter Google Scholar
Wang, Q., Zhang, J., Yang, K., Peng, K., Stiefelhagen, R.: MatchFormer: interleaving attention in transformers for feature matching. arXiv preprint arXiv:2203.09645 (2022)
Zhang, R., Zhu, S., Fang, T., Quan, L.: Distributed very large scale bundle adjustment by global camera consensus. In: ICCV (2017)
Google Scholar
Zhou, Q., Sattler, T., Leal-Taixe, L.: Patch2Pix: epipolar-guided pixel-level correspondences. In: CVPR (2021)
Google Scholar

Download references

Acknowledgements

This research was partly supported by grants of National Natural Science Foundation of China (NSFC, Grant No. 62171281), Science and Technology Commission of Shanghai Municipality (STCSM, Grant Nos. 20DZ1200203, 2021SHZDZX0102, 22DZ2229005).

Author information

Authors and Affiliations

Institute of Image Communication and Network Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China
Haobo Xu, Jun Zhou, Renjie Pan, Hua Yang, Cunyan Li & Xiangyu Zhao
Shanghai Key Lab of Digital Media Processing and Transmission, Shanghai, 200240, China
Haobo Xu, Jun Zhou, Renjie Pan, Hua Yang, Cunyan Li & Xiangyu Zhao

Authors

Haobo Xu
View author publications
You can also search for this author in PubMed Google Scholar
Jun Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Renjie Pan
View author publications
You can also search for this author in PubMed Google Scholar
Hua Yang
View author publications
You can also search for this author in PubMed Google Scholar
Cunyan Li
View author publications
You can also search for this author in PubMed Google Scholar
Xiangyu Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hua Yang .

Editor information

Editors and Affiliations

Nanjing University of Information Science and Technology, Nanjing, China
Qingshan Liu
Xiamen University, Xiamen, China
Hanzi Wang
Beijing University of Posts and Telecommunications, Beijing, China
Zhanyu Ma
Sun Yat-sen University, Guangzhou, China
Weishi Zheng
Peking University, Beijing, China
Hongbin Zha
Chinese Academy of Sciences, Beijing, China
Xilin Chen
Chinese Academy of Sciences, Beijing, China
Liang Wang
Xiamen University, Xiamen, China
Rongrong Ji

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xu, H., Zhou, J., Pan, R., Yang, H., Li, C., Zhao, X. (2024). Matching-to-Detecting: Establishing Dense and Reliable Correspondences Between Images. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14426. Springer, Singapore. https://doi.org/10.1007/978-981-99-8432-9_14

Download citation

DOI: https://doi.org/10.1007/978-981-99-8432-9_14
Published: 24 December 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8431-2
Online ISBN: 978-981-99-8432-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Matching-to-Detecting: Establishing Dense and Reliable Correspondences Between Images