FeMIP: detector-free feature matching for multimodal images with policy gradient

Di, Yide; Liao, Yun; Zhou, Hao; Zhu, Kaijun; Zhang, Yijia; Duan, Qing; Liu, Junhui; Lu, Mingyu

doi:10.1007/s10489-023-04659-5

FeMIP: detector-free feature matching for multimodal images with policy gradient

Published: 17 July 2023

Volume 53, pages 24068–24088, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Yide Di^1,3^na1,
Yun Liao^2,3^na1,
Hao Zhou³,
Kaijun Zhu³,
Yijia Zhang¹,
Qing Duan²,
Junhui Liu² &
…
Mingyu Lu¹

339 Accesses
2 Citations
Explore all metrics

Abstract

Feature matching for multimodal images is an important task in image processing. However, most methods perform image feature detection, description, and matching sequentially, resulting in a large loss, low matching accuracy, and slow performance. To tackle these challenges, we propose a detector-free method called FeMIP for feature matching of multimodal images. We design coarse matching and fine regression modules to implement accurate multimodal image feature matches in a coarse-to-fine manner. Furthermore, we add a novel data augmentation method enabling FeMIP to achieve feature matching faster and more accurately. The coarse-to-fine module automatically generates pixel-level labels on the original image, enabling FeMIP to perform pixel-level matching on data with only image-level labels. In addition, we use the principle of reinforcement learning to design a policy gradient method to improve the solution to the problem of discreteness in matching. Extensive experiments show that FeMIP has good generalization and achieves excellent matching performances. The code will be released at: https://github.com/LiaoYun0x0/FeMIP.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Image Matching from Handcrafted to Deep Features: A Survey

Article Open access 04 August 2020

Content-Aware Deep Feature Matching

A deep feature matching pipeline with triple search strategy

Article 17 June 2023

Data Availability and access

The datasets analyzed during the current study are available from the following public domain resources: https://mediatum.ub.tum.de/1474000; https://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html ; http://matthewalunbrown.com/nirscene/nirscene. html; https://github.com/AmberHen/WHU-OPT-SAR-dataset.

References

Hong D, Gao L, Yokoya N, Yao J, Chanussot J, Du Q, Zhang B (2021) More diverse means better: Multimodal deep learning meets remote-sensing imagery classification. IEEE Trans. Geosci. Remote. Sens. 59(5):4340–4354. https://doi.org/10.1109/TGRS.2020.3016820
Article Google Scholar
Ma Y, Liu Z, Chen PCL (2022) Hybrid spatial-spectral feature in broad learning system for hyperspectral image classification. Appl. Intell. 52(3):2801–2812. https://doi.org/10.1007/s10489-021-02320-7
Article Google Scholar
Feng K, Zhao Y, Chan JC, Kong SG, Zhang X, Wang B (2021) Mosaic convolution-attention network for demosaicing multispectral filter array images. IEEE Trans. Computational Imaging 7:864–878. https://doi.org/10.1109/TCI.2021.3102052
Article Google Scholar
Jiang L, Fan H, Li J (2022) A multi-focus image fusion method based on attention mechanism and supervised learning. Appl. Intell. 52(1):339–357. https://doi.org/10.1007/s10489-021-02358-7
Article Google Scholar
Puente-Castro A, Rivero D, Pazos A, Fernández-Blanco E (2022) UAV swarm path planning with reinforcement learning for field prospecting. Appl. Intell. 52(12):14101–14118. https://doi.org/10.1007/s10489-022-03254-4
Article Google Scholar
Chen J, Yang Z, Chan TN, Li H, Hou J, Chau L (2022) Attention-guided progressive neural texture fusion for high dynamic range image restoration. IEEE Trans. Image Process. 31:2661–2672. https://doi.org/10.1109/TIP.2022.3160070
Article Google Scholar
Liao, Y., Di, Y., Zhou, H., Li, A., Liu, J., Lu, M., Duan, Q.: Feature matching and position matching between optical and SAR with local deep feature descriptor. IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. 15, 448–462 (2022). https://doi.org/10.1109/JSTARS.2021.3134676
Chen J, Chen X, Chen S, Liu Y, Rao Y, Yang Y, Wang H, Wu D (2023) Shape-former: Bridging cnn and transformer via shapeconv for multimodal image matching. Information Fusion 91:445–457. https://doi.org/10.1016/j.inffus.2022.10.030
Article Google Scholar
Reyes MF, Auer S, Merkle N, Henry C, Schmitt M (2019) Sar-to-optical image translation based on conditional generative adversarial networks - optimization, opportunities and limits. Remote. Sens. 11(17):2067. https://doi.org/10.3390/rs11172067
Article Google Scholar
Karimi N, Taban MR (2021) A convex variational method for super resolution of SAR image with speckle noise. Signal Process. Image Commun. 90:116061. https://doi.org/10.1016/j.image.2020.116061
Article Google Scholar
Truong, P., Danelljan, M., Timofte, R.: Glu-net: Global-local universal network for dense flow and correspondences. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pp. 6257–6267. Computer Vision Foundation / IEEE (2020). https://doi.org/10.1109/CVPR42600.2020.00629. https://openaccess.thecvf.com/content_CVPR_2020/html/Truong_GLU-Net_Global-Local_Universal_Network_for_Dense_Flow_and_Correspondences_CVPR_2020_paper.html
Jiang, W., Trulls, E., Hosang, J., Tagliasacchi, A., Yi, K.M.: COTR: correspondence transformer for matching across images. In: 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021, pp. 6187–6197. IEEE (2021). https://doi.org/10.1109/ICCV48922.2021.00615. https://doi.org/10.1109/ICCV48922.2021.00615
Cui S, Xu M, Ma A, Zhong Y (2020) Modality-free feature detector and descriptor for multimodal remote sensing image registration. Remote. Sens. 12(18):2937. https://doi.org/10.3390/rs12182937
Article Google Scholar
Quan D, Wang S, Li Y, Yang B, Huyan N, Chanussot J, Hou B, Jiao L (2021) Multi-relation attention network for image patch matching. IEEE Trans. Image Process. 30:7127–7142. https://doi.org/10.1109/TIP.2021.3101414
Article Google Scholar
Liu, S., Peng, W., Jiang, W., Yang, Y., Zhao, J., Su, Y.: Multi-focus image fusion dataset and algorithm test in real environment. Frontiers in Neurorobotics (2022)
Ye Y, Shen L, Hao M, Wang J, Xu Z (2017) Robust optical-to-sar image matching based on shape properties. IEEE Geosci. Remote. Sens. Lett. 14(4):564–568. https://doi.org/10.1109/LGRS.2017.2660067
Article Google Scholar
Xiong X, Xu Q, Jin G, Zhang H, Gao X (2020) Rank-based local self-similarity descriptor for optical-to-sar image matching. IEEE Geosci. Remote. Sens. Lett. 17(10):1742–1746. https://doi.org/10.1109/LGRS.2019.2955153
Article Google Scholar
Gao K, Aliakbarpour H, Seetharaman G, Palaniappan K (2021) Dct-based local descriptor for robust matching and feature tracking in wide area motion imagery. IEEE Geosci. Remote. Sens. Lett. 18(8):1441–1445. https://doi.org/10.1109/LGRS.2020.3000762
Article Google Scholar
Fu Z, Qin Q, Luo B, Wu C, Sun H (2019) A local feature descriptor based on combination of structure and texture information for multispectral image matching. IEEE Geosci. Remote. Sens. Lett. 16(1):100–104. https://doi.org/10.1109/LGRS.2018.2867635
Article Google Scholar
Cheng M, Matsuoka M (2020) An enhanced image matching strategy using binary-stream feature descriptors. IEEE Geosci. Remote. Sens. Lett. 17(7):1253–1257. https://doi.org/10.1109/LGRS.2019.2943237
Article Google Scholar
Sarlin, P., DeTone, D., Malisiewicz, T., Rabinovich, A.: Superglue: Learning feature matching with graph neural networks. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pp. 4937–4946. Computer Vision Foundation / IEEE (2020). https://doi.org/10.1109/CVPR42600.2020.00499. https://openaccess.thecvf.com/content_CVPR_2020/html/Sarlin_SuperGlue_Learning_Feature_Matching_With_Graph_Neural_Networks_CVPR_2020_paper.html
Ma T, Ma J, Yu K, Zhang J, Fu W (2021) Multispectral remote sensing image matching via image transfer by regularized conditional generative adversarial networks and local feature. IEEE Geosci. Remote. Sens. Lett. 18(2):351–355. https://doi.org/10.1109/LGRS.2020.2972361
Article Google Scholar
Jiang B, Sun P, Luo B (2022) Glmnet: Graph learning-matching convolutional networks for feature matching. Pattern Recognit. 121:108167. https://doi.org/10.1016/j.patcog.2021.108167
Article Google Scholar
Truong, P., Danelljan, M., Gool, L.V., Timofte, R.: Gocor: Bringing globally optimized correspondence volumes into your neural network. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H. (eds.) Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, Virtual (2020). https://proceedings.neurips.cc/paper/2020/hash/a4a8a31750a23de2da88ef6a491dfd5c-Abstract.html
Li, X., Han, K., Li, S., Prisacariu, V.: Dual-resolution correspondence networks. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H. (eds.) Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, Virtual (2020). https://proceedings.neurips.cc/paper/2020/hash/c91591a8d461c2869b9f535ded3e213e-Abstract.html
Truong, P., Danelljan, M., Gool, L.V., Timofte, R.: Learning accurate dense correspondences and when to trust them. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, Virtual, June 19-25, 2021, pp. 5714–5724. Computer Vision Foundation / IEEE (2021). https://openaccess.thecvf.com/content/CVPR2021/html/Truong_Learning_Accurate_Dense_Correspondences_and_When_To_Trust_Them_CVPR_2021_paper.html
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J. (eds.) Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12346, pp. 213–229. Springer (2020). https://doi.org/10.1007/978-3-030-58452-8_13. https://doi.org/10.1007/978-3-030-58452-8_13
Dong, X., Bao, J., Chen, D., Zhang, W., Yu, N., Yuan, L., Chen, D., Guo, B.: Cswin transformer: A general vision transformer backbone with cross-shaped windows. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, pp. 12114–12124. IEEE (2022). https://doi.org/10.1109/CVPR52688.2022.01181. https://doi.org/10.1109/CVPR52688.2022.01181
Sun, J., Shen, Z., Wang, Y., Bao, H., Zhou, X.: Loftr: Detector-free local feature matching with transformers. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, Virtual, June 19-25, 2021, pp. 8922–8931. Computer Vision Foundation / IEEE (2021). https://openaccess.thecvf.com/content/CVPR2021/html/Sun_LoFTR_Detector-Free_Local_Feature_Matching_With_Transformers_CVPR_2021_paper.html
Bhowmik, A., Gumhold, S., Rother, C., Brachmann, E.: Reinforced feature points: Optimizing feature detection and description for a high-level task. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pp. 4947–4956. Computer Vision Foundation / IEEE (2020). https://doi.org/10.1109/CVPR42600.2020.00500. https://openaccess.thecvf.com/content_CVPR_2020/html/Bhowmik_Reinforced_Feature_Points_Optimizing_Feature_Detection_and_Description_for_a_CVPR_2020_paper.html
Tyszkiewicz, M.J., Fua, P., Trulls, E.: DISK: learning local features with policy gradient. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H. (eds.) Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, Virtual (2020). https://proceedings.neurips.cc/paper/2020/hash/a42a596fc71e17828440030074d15e74-Abstract.html
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In: Guyon, I., von Luxburg, U., Bengio, S., Wallach, H.M., Fergus, R., Vishwanathan, S.V.N., Garnett, R. (eds.) Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pp. 5998–6008 (2017). https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html
Katharopoulos, A., Vyas, A., Pappas, N., Fleuret, F.: Transformers are rnns: Fast autoregressive transformers with linear attention. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 5156–5165. PMLR (2020). http://proceedings.mlr.press/v119/katharopoulos20a.html
Rocco, I., Cimpoi, M., Arandjelovic, R., Torii, A., Pajdla, T., Sivic, J.: Neighbourhood consensus networks. In: Bengio, S., Wallach, H.M., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, pp. 1658–1669 (2018). https://proceedings.neurips.cc/paper/2018/hash/8f7d807e1f53eff5f9efbe5cb81090fb-Abstract.html
Roßberg, T., Schmitt, M.: Estimating NDVI from sentinel-1 sar data using deep learning. In: IEEE International Geoscience and Remote Sensing Symposium, IGARSS 2022, Kuala Lumpur, Malaysia, July 17-22, 2022, pp. 1412–1415. IEEE (2022). https://doi.org/10.1109/IGARSS46834.2022.9883707. https://doi.org/10.1109/IGARSS46834.2022.9883707
Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: Fitzgibbon, A.W., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) Computer Vision - ECCV 2012 - 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part V. Lecture Notes in Computer Science, vol. 7576, pp. 746–760. Springer (2012). https://doi.org/10.1007/978-3-642-33715-4_54. https://doi.org/10.1007/978-3-642-33715-4_54
Brown, M.A., Süsstrunk, S.: Multi-spectral SIFT for scene category recognition. In: The 24th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2011, Colorado Springs, CO, USA, 20-25 June 2011, pp. 177–184. IEEE Computer Society (2011). https://doi.org/10.1109/CVPR.2011.5995637. https://doi.org/10.1109/CVPR.2011.5995637
Li X, Zhang G, Cui H, Hou S, Wang S, Li X, Chen Y, Li Z, Zhang L (2022) Mcanet: A joint semantic segmentation framework of optical and SAR images for land use classification. Int. J. Appl. Earth Obs. Geoinformation 106:102638. https://doi.org/10.1016/j.jag.2021.102638
Article Google Scholar
Li J, Xu W, Shi P, Zhang Y, Hu Q (2022) LNIFT: locally normalized image for rotation invariant multimodal feature matching. IEEE Trans. Geosci. Remote. Sens. 60:1–14. https://doi.org/10.1109/TGRS.2022.3165940
Article Google Scholar
Mishchuk, A., Mishkin, D., Radenovic, F., Matas, J.: Working hard to know your neighbor’s margins: Local descriptor learning loss. In: Guyon, I., von Luxburg, U., Bengio, S., Wallach, H.M., Fergus, R., Vishwanathan, S.V.N., Garnett, R. (eds.) Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pp. 4826–4837 (2017). https://proceedings.neurips.cc/paper/2017/hash/831caa1b600f852b7844499430ecac17-Abstract.html
Balntas, V., Riba, E., Ponsa, D., Mikolajczyk, K.: Learning local feature descriptors with triplets and shallow convolutional neural networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press (2016). http://www.bmva.org/bmvc/2016/papers/paper119/index.html
Han, X., Leung, T., Jia, Y., Sukthankar, R., Berg, A.C.: Matchnet: Unifying feature and metric learning for patch-based matching. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015, pp. 3279–3286. IEEE Computer Society (2015). https://doi.org/10.1109/CVPR.2015.7298948. https://doi.org/10.1109/CVPR.2015.7298948
Dusmanu, M., Rocco, I., Pajdla, T., Pollefeys, M., Sivic, J., Torii, A., Sattler, T.: D2-net: A trainable CNN for joint description and detection of local features. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pp. 8092–8101. Computer Vision Foundation / IEEE (2019). https://doi.org/10.1109/CVPR.2019.00828. http://openaccess.thecvf.com/content_CVPR_2019/html/Dusmanu_D2-Net_A_Trainable_CNN_for_Joint_Description_and_Detection_of_CVPR_2019_paper.html
DeTone, D., Malisiewicz, T., Rabinovich, A.: Superpoint: Self-supervised interest point detection and description. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 224–236. Computer Vision Foundation / IEEE Computer Society (2018). https://doi.org/10.1109/CVPRW.2018.00060. http://openaccess.thecvf.com/content_cvpr_2018_workshops/w9/html/DeTone_SuperPoint_Self-Supervised_Interest_CVPR_2018_paper.html

Download references

Acknowledgements

This work is supported by a grant from the Social and Science Foundation of Liaoning Province (No. L20BTQ008), in part by the National Natural Science Foundation of China under Grant 61976124 and in part by the Scientific Research Fund of Yunnan Provincial Education Department under Grant 2021J0007.

Author information

Yide Di and Yun Liao are authors contributed equally to this work.

Authors and Affiliations

School of information science and technology, Dalian Maritime University, Dalian, Liaoning province, China
Yide Di, Yijia Zhang & Mingyu Lu
National Pilot School of Software, Yunnan University, Kunming, Yunnan province, China
Yun Liao, Qing Duan & Junhui Liu
Yunnan Lanyi Network Technology Co, Kunming, Yunnan province, China
Yide Di, Yun Liao, Hao Zhou & Kaijun Zhu

Authors

Yide Di
View author publications
You can also search for this author in PubMed Google Scholar
Yun Liao
View author publications
You can also search for this author in PubMed Google Scholar
Hao Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Kaijun Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Yijia Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Qing Duan
View author publications
You can also search for this author in PubMed Google Scholar
Junhui Liu
View author publications
You can also search for this author in PubMed Google Scholar
Mingyu Lu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Yide Di: Conceptualization and Writing; Yun Liao: Methodology; Hao Zhou: Software; Kaijun Zhu: Validation; Yijia Zhang: Formal analysis; Qing Duan: Investigation; Junhui Liu: Data Curation; Mingyu Lu: Supervision.

Corresponding author

Correspondence to Mingyu Lu.

Ethics declarations

Ethical and informed consent for data used

This submission does not include human or animal research.

Competing Interests

The authors declare that they have no conflicts of interests or competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Di, Y., Liao, Y., Zhou, H. et al. FeMIP: detector-free feature matching for multimodal images with policy gradient. Appl Intell 53, 24068–24088 (2023). https://doi.org/10.1007/s10489-023-04659-5

Download citation

Accepted: 23 April 2023
Published: 17 July 2023
Issue Date: October 2023
DOI: https://doi.org/10.1007/s10489-023-04659-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

FeMIP: detector-free feature matching for multimodal images with policy gradient

Abstract

Access this article

Similar content being viewed by others

Image Matching from Handcrafted to Deep Features: A Survey

Content-Aware Deep Feature Matching

A deep feature matching pipeline with triple search strategy

Data Availability and access

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethical and informed consent for data used

Competing Interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

FeMIP: detector-free feature matching for multimodal images with policy gradient

Abstract

Access this article

Similar content being viewed by others

Image Matching from Handcrafted to Deep Features: A Survey

Content-Aware Deep Feature Matching

A deep feature matching pipeline with triple search strategy

Data Availability and access

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethical and informed consent for data used

Competing Interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation