Skip to main content
Log in

Grownbb: Gromov–Wasserstein learning of neural best buddies for cross-domain correspondence

  • Research
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Identifying pixel correspondences between two images is a fundamental task in computer vision, and has been widely used for 3D reconstruction, image morphing, and image retrieval. The neural best buddies (NBB) finds sparse correspondences between cross-domain images, which have semantically related local structures, though could be quite different in semantics as well as appearances. This paper presents a new method for cross-domain image correspondence, called GroWNBB, by incorporating the Gromov–Wasserstein learning into the NBB framework. Specifically, we utilize the NBB as the backbone to search feature matching from deep layer and propagate to low layer. While for each layer, we modify the strategy of NBB by further mapping the matching pairs obtained from the NBB within and across images into graphs, then formulate the matches as optimal transport between graphs, and use Gromov–Wasserstein learning to establish matches between these graphs. Consequently, our approach considers the relationships between images as well as the relationships within images, which makes the correspondence more stable. Our experiments demonstrate that GroWNBB achieves state-of-the-art performance on cross-domain correspondence and outperforms other popular methods in intra-class and same object correspondence estimation. Our code is available at https://github.com/NolanInLowland/GroWNBB.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Algorithm 1
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Heinly, J., Schoenberger, J., Dunn, E., Frahm, J.M.: Reconstructing the world in six days. In: Conference on Computer Vision and Pattern Recognition, pp. 3287–3295 (2015)

  2. Sunnie, S.Y.K., Nicholas, K., Dunn, E., Jason, S., Gregory, S.: Deformable style transfer. In: European Conference on Computer Vision, pp. 246–261 (2020)

  3. Liu, X., Li, X., Cheng, M., Hall, P.: Geometric Style Transfer. https://doi.org/10.48550/arXiv.2007.05471 (2020)

  4. Fan, J., Yang, X., Lu, R., Li, W., Huang, Y.: Long-term visual tracking algorithm for uavs based on kernel correlation filtering and surf features. Vis. Comput. 39, 319–333 (2023)

    Article  Google Scholar 

  5. Noh, H., Araujo, A., Sim, J., Weyand, T., Han, B.: Large scale image retrieval with attentive deep local features. In: International Conference on Computer Vision, pp. 2476–3485 (2017)

  6. Fischler, M., Bolles, R.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)

    Article  MathSciNet  Google Scholar 

  7. Yi, K.M., Trulls, E., Ono, Y., Lepetit, V., Salzmann, M., Fua, P.: Learning to find good correspondences. In: Conference on Computer Vision and Pattern Recognition, pp. 2666–2674 (2018)

  8. Harris, C., Stephens, M.: A combined corner and edge detector. In: Alvey Vision Conference, pp. 147–151 (1988)

  9. Smith, S.M., Brady, J.M.: Susan: a new approach to low level image processing. Int. J. Comput. Vis. 23(1), 45–78 (1997)

    Article  Google Scholar 

  10. Rosten, E., Porter, R., Drummond, T.: Faster and better: a machine learning approach to corner detection. IEEE Trans. Pattern Anal. Mach. Intell. 32(1), 105–119 (2010)

    Article  PubMed  Google Scholar 

  11. Rublee, E., Rabaud, V., Konolige, K., Bradski, G.R.: Orb: An effificient alternative to sift or surf. In: IEEE International Conference on Computer Vision, pp. 2564–2571 (2011)

  12. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)

    Article  Google Scholar 

  13. Bay, H., Tuytelaars, T., Van Gool, L.: SURF: speeded up robust features. In: European Conference on Computer Vision, pp. 404–417 (2006)

  14. Agrawal, M., Konolige, K., Blas, M.R.: Censure: center surround extremas for realtime feature detection. In: Proceedings of the European Conference on Computer Vision, pp. 102–115 (2008)

  15. Alcantarilla, P.F., Bartoli, A., Davison, A.J.: Kaze features. In Proceedings of the European Conference on Computer Vision, pp. 214–227 (2012)

  16. Ojala, T., Pietikäinen, M., Mäenpää, T.: Multiresolution grayscale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 971–987 (2002)

    Article  Google Scholar 

  17. Calonder, M., Lepetit, V., Strecha, C., Fua, P.: Brief: binary robust independent elementary features. In: Proceedings of the European Conference on Computer Vision, pp. 778–792 (2010)

  18. Verdie, Y., Yi, K.M., Fua, P., Lepetit, V.: TILDE: a temporally invariant learned detector. In: Conference on Computer Vision and Pattern Recognition, pp. 5279–5288 (2015)

  19. Laguna, A.B., Mikolajczyk, K.: Key.net: Keypoint detection by handcrafted and learned cnn filters revisited. IEEE Trans. Pattern Anal. Mach. Intell. 45(1), 698–711 (2022)

    Article  Google Scholar 

  20. Cho, Y., Faisal, M., Sadiq, U., Arif, T., Hafiz, R., Seo, J., Ali, M.: Learning to detect local features using information change. IEEE Access 9(43), 43898–43908 (2021)

    Article  Google Scholar 

  21. Balntas, V., Riba, E., Ponsa, D., Mikolajczyk, K.: Learning local feature descriptors with triplets and shallow convolutional neural networks. In: British Machine Vision Conference, pp. 119.1–119.11 (2016)

  22. Tian, Y., Fan, F.B., Wu: L2-Net: deep learning of discriminative patch descriptor in Euclidean space. In: Conference on Computer Vision and Pattern Recognition, pp. 6128–6136 (2017)

  23. Mishchuk, A., Mishkin, D., Radenovic, F., Matas, J.: Working hard to know your neighbor’s margins: local descriptor learning loss. In: Conference on Neural Information Processing Systems, pp. 4829–4840 (2017)

  24. Sarlin, P., DeTone, D., Malisiewicz, T., Rabinovich, A.: Superglue: learning feature matching with graph neural networks. In: Conference on Computer Vision and Pattern Recognition, pp. 4937–4946 (2020)

  25. Detone, D., Malisiewicz, T., Rabinovich, A.: Superpoint: self-supervised interest point detection and description. In: Conference on Computer Vision and Pattern Recognition, pp. 337–33712 (2018)

  26. Dusmanu, M., Rocco, I., Pajdla, T., Pollefeys, M., Sivic, J., Torii, A., Sattler, T.: D2-Net: a trainable CNN for joint detection and description of local features. In: Conference on Computer Vision and Pattern Recognition, pp. 8084–8093 (2019)

  27. Ono, Y., Trulls, E., Fua, P., Yi, K.M.: LF-Net: learning local features from images. In: Conference on Neural Information Processing Systems, pp. 6237–6247 (2018)

  28. Shen, Z., Kong, B., Dong, X.: Maim: a mixer mlp architecture for image matching. Vis. Comput. (2023). https://doi.org/10.1007/s00371-023-02851-9

    Article  Google Scholar 

  29. Gao, Y., He, J., Zhang, T., Zhang, Y.: Dynamic keypoint detection network for image matching. IEEE Trans. Pattern Anal. Mach. Intell. 45(12), 14404–14419 (2023)

    Article  PubMed  Google Scholar 

  30. Cho, S., Hong, S., Jeon, S., Lee, Y., Sohn, K., Kim, S.: CATs: cost aggregation transformers for visual correspondence. In: Conference on Neural Information Processing Systems. https://doi.org/10.48550/arXiv.2106.02520 (2021)

  31. Truong, P., Danelljan, M., Gool, L.V., Timofte, R.: Learning accurate dense correspondences and when to trust them. In: Conference on Computer Vision and Pattern Recognition, pp. 5710–5720 (2021)

  32. Zhang, P., Zhang, B., Chen, D., Yuan, L., Wen, F.: Cross-domain correspondence learning for exemplar-based image translation. In: Conference on Computer Vision and Pattern Recognition, pp. 5142–5152 (2020)

  33. Aberman, K., Liao, J., Shi, M.: Neural best-buddies: sparse cross-domain correspondence. ACM Trans. Graph. 37(4), 69 (2018)

    Article  Google Scholar 

  34. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations. https://doi.org/10.48550/arXiv.1409.1556 (2015)

  35. Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: European Conference on Computer Vision. https://doi.org/10.48550/arXiv.1311.2901 (2013)

  36. Lindeberg, T.: Detecting salient blob-like image structures and their scales with a scale-space primal sketch: a method for focus-of-attention. Int. J. Comput. Vis. 11(3), 283–318 (1993)

    Article  Google Scholar 

  37. Wang, C., Xu, R., Xu, S., W., M., X, Z.: Cndesc: cross normalization for local descriptors learning. IEEE Trans. Multimedia 25, 3989–4001 (2023)

  38. Zhao, X., Wu, X., Miao, J., Chen, W., Chen, P.C.Y., Li, Z.: Alike: accurate and lightweight keypoint detection and descriptor extraction. IEEE Trans. Multimedia 25, 3101–3112 (2023)

    Article  Google Scholar 

  39. Xu, H., Luo, D., Zha, H., Carin, L.: Gromov–Wasserstein learning for graph matching and node embedding. In: International Conference on Machine Learning, pp. 6932–6941 (2019)

  40. Bronstein, A.M., Bronstein, M.M., Kimmel, M.R., Mahmoudi, S.G.: A Gromov-Hausdorff framework with diffusion geometry for topologically-robust non-rigid shape matching. Int. J. Comput. Vis. 89, 266–286 (2010)

    Article  Google Scholar 

  41. Memoli, F.: Gromov–Hausdorff distances in Euclidean spaces. In: Conference on Computer Vision and Pattern Recognition Workshops, pp. 1–8 (2008)

  42. Villani, C.: Optimal Transport: Old and New. Springer, Cham (2008)

    Google Scholar 

  43. Memoli, F.: Spectral Gromov-Wasserstein distances for shape matching. In: International Conference on Computer Vision Workshops, pp. 256–263 (2009)

  44. Memoli, F.: Gromov–Wasserstein distances and the metric approach to object matching. Found. Comput. Math. 11(4), 417–487 (2011)

    Article  MathSciNet  Google Scholar 

  45. Yan, Y., Li, W., Wu, H., Min, H., Tan, M., Wu, Q.: Semi-supervised optimal transport for heterogeneous domain adaptation. In: International Joint Conference on Artificial Intelligence, pp. 2969–2975 (2018)

  46. Vayer, T., Chapel, L., Flamary, R., Tavenard, R., Courty, N.: Optimal transport for structured data with application on graphs. In: International Conference on Machine Learning. https://doi.org/10.48550/arXiv.1805.09114 (2019)

  47. Xiang, Y., Mottaghi, R., Savarese, S.: Beyond pascal: a benchmark for 3d object detection in the wild. In: IEEE Winter Conference on Applications of Computer Vision, pp. 75–82 (2014)

  48. Schaefer, S., McPhail, T., Warren, J.: Image deformation using moving least squares. ACM Trans. Graph. 25(3), 533–540 (2006)

    Article  Google Scholar 

  49. Radenovic, F., Iscen, A., Tolias, G., Avrithis, Y., Chum, O.: Revisiting Oxford and Paris: large-scale image retrieval benchmarking. In: Conference on Computer Vision and Pattern Recognition, pp. 5706–5715 (2018)

Download references

Acknowledgements

The authors would like to thank the editors and the anonymous reviewers for their constructive comments and suggestions. This paper is supported by the National Natural Science Foundation of China (Grant Nos. 61972264, 62072312, 62372302) and Natural Science Foundation of Shenzhen (Grant No. 20200807165235002).

Author information

Authors and Affiliations

Authors

Contributions

Ruolan Tang contributed to the conceptualization, methodology, and writing-original draft; Weiwei Wang contributed to the conceptualization, writing—review, editing and supervision; Yu Han contributed to the writing—review and editing; Xiangchu Feng contributed to the conceptualization.

Corresponding author

Correspondence to Weiwei Wang.

Ethics declarations

Conflicts of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tang, R., Wang, W., Han, Y. et al. Grownbb: Gromov–Wasserstein learning of neural best buddies for cross-domain correspondence. Vis Comput (2024). https://doi.org/10.1007/s00371-023-03251-9

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00371-023-03251-9

Keywords

Navigation