Abstract
Feature matching for multimodal images is an important task in image processing. However, most methods perform image feature detection, description, and matching sequentially, resulting in a large loss, low matching accuracy, and slow performance. To tackle these challenges, we propose a detector-free method called FeMIP for feature matching of multimodal images. We design coarse matching and fine regression modules to implement accurate multimodal image feature matches in a coarse-to-fine manner. Furthermore, we add a novel data augmentation method enabling FeMIP to achieve feature matching faster and more accurately. The coarse-to-fine module automatically generates pixel-level labels on the original image, enabling FeMIP to perform pixel-level matching on data with only image-level labels. In addition, we use the principle of reinforcement learning to design a policy gradient method to improve the solution to the problem of discreteness in matching. Extensive experiments show that FeMIP has good generalization and achieves excellent matching performances. The code will be released at: https://github.com/LiaoYun0x0/FeMIP.
Similar content being viewed by others
Data Availability and access
The datasets analyzed during the current study are available from the following public domain resources: https://mediatum.ub.tum.de/1474000; https://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html ; http://matthewalunbrown.com/nirscene/nirscene. html; https://github.com/AmberHen/WHU-OPT-SAR-dataset.
References
Hong D, Gao L, Yokoya N, Yao J, Chanussot J, Du Q, Zhang B (2021) More diverse means better: Multimodal deep learning meets remote-sensing imagery classification. IEEE Trans. Geosci. Remote. Sens. 59(5):4340–4354. https://doi.org/10.1109/TGRS.2020.3016820
Ma Y, Liu Z, Chen PCL (2022) Hybrid spatial-spectral feature in broad learning system for hyperspectral image classification. Appl. Intell. 52(3):2801–2812. https://doi.org/10.1007/s10489-021-02320-7
Feng K, Zhao Y, Chan JC, Kong SG, Zhang X, Wang B (2021) Mosaic convolution-attention network for demosaicing multispectral filter array images. IEEE Trans. Computational Imaging 7:864–878. https://doi.org/10.1109/TCI.2021.3102052
Jiang L, Fan H, Li J (2022) A multi-focus image fusion method based on attention mechanism and supervised learning. Appl. Intell. 52(1):339–357. https://doi.org/10.1007/s10489-021-02358-7
Puente-Castro A, Rivero D, Pazos A, Fernández-Blanco E (2022) UAV swarm path planning with reinforcement learning for field prospecting. Appl. Intell. 52(12):14101–14118. https://doi.org/10.1007/s10489-022-03254-4
Chen J, Yang Z, Chan TN, Li H, Hou J, Chau L (2022) Attention-guided progressive neural texture fusion for high dynamic range image restoration. IEEE Trans. Image Process. 31:2661–2672. https://doi.org/10.1109/TIP.2022.3160070
Liao, Y., Di, Y., Zhou, H., Li, A., Liu, J., Lu, M., Duan, Q.: Feature matching and position matching between optical and SAR with local deep feature descriptor. IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. 15, 448–462 (2022). https://doi.org/10.1109/JSTARS.2021.3134676
Chen J, Chen X, Chen S, Liu Y, Rao Y, Yang Y, Wang H, Wu D (2023) Shape-former: Bridging cnn and transformer via shapeconv for multimodal image matching. Information Fusion 91:445–457. https://doi.org/10.1016/j.inffus.2022.10.030
Reyes MF, Auer S, Merkle N, Henry C, Schmitt M (2019) Sar-to-optical image translation based on conditional generative adversarial networks - optimization, opportunities and limits. Remote. Sens. 11(17):2067. https://doi.org/10.3390/rs11172067
Karimi N, Taban MR (2021) A convex variational method for super resolution of SAR image with speckle noise. Signal Process. Image Commun. 90:116061. https://doi.org/10.1016/j.image.2020.116061
Truong, P., Danelljan, M., Timofte, R.: Glu-net: Global-local universal network for dense flow and correspondences. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pp. 6257–6267. Computer Vision Foundation / IEEE (2020). https://doi.org/10.1109/CVPR42600.2020.00629. https://openaccess.thecvf.com/content_CVPR_2020/html/Truong_GLU-Net_Global-Local_Universal_Network_for_Dense_Flow_and_Correspondences_CVPR_2020_paper.html
Jiang, W., Trulls, E., Hosang, J., Tagliasacchi, A., Yi, K.M.: COTR: correspondence transformer for matching across images. In: 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021, pp. 6187–6197. IEEE (2021). https://doi.org/10.1109/ICCV48922.2021.00615. https://doi.org/10.1109/ICCV48922.2021.00615
Cui S, Xu M, Ma A, Zhong Y (2020) Modality-free feature detector and descriptor for multimodal remote sensing image registration. Remote. Sens. 12(18):2937. https://doi.org/10.3390/rs12182937
Quan D, Wang S, Li Y, Yang B, Huyan N, Chanussot J, Hou B, Jiao L (2021) Multi-relation attention network for image patch matching. IEEE Trans. Image Process. 30:7127–7142. https://doi.org/10.1109/TIP.2021.3101414
Liu, S., Peng, W., Jiang, W., Yang, Y., Zhao, J., Su, Y.: Multi-focus image fusion dataset and algorithm test in real environment. Frontiers in Neurorobotics (2022)
Ye Y, Shen L, Hao M, Wang J, Xu Z (2017) Robust optical-to-sar image matching based on shape properties. IEEE Geosci. Remote. Sens. Lett. 14(4):564–568. https://doi.org/10.1109/LGRS.2017.2660067
Xiong X, Xu Q, Jin G, Zhang H, Gao X (2020) Rank-based local self-similarity descriptor for optical-to-sar image matching. IEEE Geosci. Remote. Sens. Lett. 17(10):1742–1746. https://doi.org/10.1109/LGRS.2019.2955153
Gao K, Aliakbarpour H, Seetharaman G, Palaniappan K (2021) Dct-based local descriptor for robust matching and feature tracking in wide area motion imagery. IEEE Geosci. Remote. Sens. Lett. 18(8):1441–1445. https://doi.org/10.1109/LGRS.2020.3000762
Fu Z, Qin Q, Luo B, Wu C, Sun H (2019) A local feature descriptor based on combination of structure and texture information for multispectral image matching. IEEE Geosci. Remote. Sens. Lett. 16(1):100–104. https://doi.org/10.1109/LGRS.2018.2867635
Cheng M, Matsuoka M (2020) An enhanced image matching strategy using binary-stream feature descriptors. IEEE Geosci. Remote. Sens. Lett. 17(7):1253–1257. https://doi.org/10.1109/LGRS.2019.2943237
Sarlin, P., DeTone, D., Malisiewicz, T., Rabinovich, A.: Superglue: Learning feature matching with graph neural networks. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pp. 4937–4946. Computer Vision Foundation / IEEE (2020). https://doi.org/10.1109/CVPR42600.2020.00499. https://openaccess.thecvf.com/content_CVPR_2020/html/Sarlin_SuperGlue_Learning_Feature_Matching_With_Graph_Neural_Networks_CVPR_2020_paper.html
Ma T, Ma J, Yu K, Zhang J, Fu W (2021) Multispectral remote sensing image matching via image transfer by regularized conditional generative adversarial networks and local feature. IEEE Geosci. Remote. Sens. Lett. 18(2):351–355. https://doi.org/10.1109/LGRS.2020.2972361
Jiang B, Sun P, Luo B (2022) Glmnet: Graph learning-matching convolutional networks for feature matching. Pattern Recognit. 121:108167. https://doi.org/10.1016/j.patcog.2021.108167
Truong, P., Danelljan, M., Gool, L.V., Timofte, R.: Gocor: Bringing globally optimized correspondence volumes into your neural network. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H. (eds.) Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, Virtual (2020). https://proceedings.neurips.cc/paper/2020/hash/a4a8a31750a23de2da88ef6a491dfd5c-Abstract.html
Li, X., Han, K., Li, S., Prisacariu, V.: Dual-resolution correspondence networks. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H. (eds.) Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, Virtual (2020). https://proceedings.neurips.cc/paper/2020/hash/c91591a8d461c2869b9f535ded3e213e-Abstract.html
Truong, P., Danelljan, M., Gool, L.V., Timofte, R.: Learning accurate dense correspondences and when to trust them. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, Virtual, June 19-25, 2021, pp. 5714–5724. Computer Vision Foundation / IEEE (2021). https://openaccess.thecvf.com/content/CVPR2021/html/Truong_Learning_Accurate_Dense_Correspondences_and_When_To_Trust_Them_CVPR_2021_paper.html
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J. (eds.) Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12346, pp. 213–229. Springer (2020). https://doi.org/10.1007/978-3-030-58452-8_13. https://doi.org/10.1007/978-3-030-58452-8_13
Dong, X., Bao, J., Chen, D., Zhang, W., Yu, N., Yuan, L., Chen, D., Guo, B.: Cswin transformer: A general vision transformer backbone with cross-shaped windows. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, pp. 12114–12124. IEEE (2022). https://doi.org/10.1109/CVPR52688.2022.01181. https://doi.org/10.1109/CVPR52688.2022.01181
Sun, J., Shen, Z., Wang, Y., Bao, H., Zhou, X.: Loftr: Detector-free local feature matching with transformers. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, Virtual, June 19-25, 2021, pp. 8922–8931. Computer Vision Foundation / IEEE (2021). https://openaccess.thecvf.com/content/CVPR2021/html/Sun_LoFTR_Detector-Free_Local_Feature_Matching_With_Transformers_CVPR_2021_paper.html
Bhowmik, A., Gumhold, S., Rother, C., Brachmann, E.: Reinforced feature points: Optimizing feature detection and description for a high-level task. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pp. 4947–4956. Computer Vision Foundation / IEEE (2020). https://doi.org/10.1109/CVPR42600.2020.00500. https://openaccess.thecvf.com/content_CVPR_2020/html/Bhowmik_Reinforced_Feature_Points_Optimizing_Feature_Detection_and_Description_for_a_CVPR_2020_paper.html
Tyszkiewicz, M.J., Fua, P., Trulls, E.: DISK: learning local features with policy gradient. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H. (eds.) Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, Virtual (2020). https://proceedings.neurips.cc/paper/2020/hash/a42a596fc71e17828440030074d15e74-Abstract.html
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In: Guyon, I., von Luxburg, U., Bengio, S., Wallach, H.M., Fergus, R., Vishwanathan, S.V.N., Garnett, R. (eds.) Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pp. 5998–6008 (2017). https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html
Katharopoulos, A., Vyas, A., Pappas, N., Fleuret, F.: Transformers are rnns: Fast autoregressive transformers with linear attention. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 5156–5165. PMLR (2020). http://proceedings.mlr.press/v119/katharopoulos20a.html
Rocco, I., Cimpoi, M., Arandjelovic, R., Torii, A., Pajdla, T., Sivic, J.: Neighbourhood consensus networks. In: Bengio, S., Wallach, H.M., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, pp. 1658–1669 (2018). https://proceedings.neurips.cc/paper/2018/hash/8f7d807e1f53eff5f9efbe5cb81090fb-Abstract.html
Roßberg, T., Schmitt, M.: Estimating NDVI from sentinel-1 sar data using deep learning. In: IEEE International Geoscience and Remote Sensing Symposium, IGARSS 2022, Kuala Lumpur, Malaysia, July 17-22, 2022, pp. 1412–1415. IEEE (2022). https://doi.org/10.1109/IGARSS46834.2022.9883707. https://doi.org/10.1109/IGARSS46834.2022.9883707
Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: Fitzgibbon, A.W., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) Computer Vision - ECCV 2012 - 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part V. Lecture Notes in Computer Science, vol. 7576, pp. 746–760. Springer (2012). https://doi.org/10.1007/978-3-642-33715-4_54. https://doi.org/10.1007/978-3-642-33715-4_54
Brown, M.A., Süsstrunk, S.: Multi-spectral SIFT for scene category recognition. In: The 24th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2011, Colorado Springs, CO, USA, 20-25 June 2011, pp. 177–184. IEEE Computer Society (2011). https://doi.org/10.1109/CVPR.2011.5995637. https://doi.org/10.1109/CVPR.2011.5995637
Li X, Zhang G, Cui H, Hou S, Wang S, Li X, Chen Y, Li Z, Zhang L (2022) Mcanet: A joint semantic segmentation framework of optical and SAR images for land use classification. Int. J. Appl. Earth Obs. Geoinformation 106:102638. https://doi.org/10.1016/j.jag.2021.102638
Li J, Xu W, Shi P, Zhang Y, Hu Q (2022) LNIFT: locally normalized image for rotation invariant multimodal feature matching. IEEE Trans. Geosci. Remote. Sens. 60:1–14. https://doi.org/10.1109/TGRS.2022.3165940
Mishchuk, A., Mishkin, D., Radenovic, F., Matas, J.: Working hard to know your neighbor’s margins: Local descriptor learning loss. In: Guyon, I., von Luxburg, U., Bengio, S., Wallach, H.M., Fergus, R., Vishwanathan, S.V.N., Garnett, R. (eds.) Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pp. 4826–4837 (2017). https://proceedings.neurips.cc/paper/2017/hash/831caa1b600f852b7844499430ecac17-Abstract.html
Balntas, V., Riba, E., Ponsa, D., Mikolajczyk, K.: Learning local feature descriptors with triplets and shallow convolutional neural networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press (2016). http://www.bmva.org/bmvc/2016/papers/paper119/index.html
Han, X., Leung, T., Jia, Y., Sukthankar, R., Berg, A.C.: Matchnet: Unifying feature and metric learning for patch-based matching. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015, pp. 3279–3286. IEEE Computer Society (2015). https://doi.org/10.1109/CVPR.2015.7298948. https://doi.org/10.1109/CVPR.2015.7298948
Dusmanu, M., Rocco, I., Pajdla, T., Pollefeys, M., Sivic, J., Torii, A., Sattler, T.: D2-net: A trainable CNN for joint description and detection of local features. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pp. 8092–8101. Computer Vision Foundation / IEEE (2019). https://doi.org/10.1109/CVPR.2019.00828. http://openaccess.thecvf.com/content_CVPR_2019/html/Dusmanu_D2-Net_A_Trainable_CNN_for_Joint_Description_and_Detection_of_CVPR_2019_paper.html
DeTone, D., Malisiewicz, T., Rabinovich, A.: Superpoint: Self-supervised interest point detection and description. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 224–236. Computer Vision Foundation / IEEE Computer Society (2018). https://doi.org/10.1109/CVPRW.2018.00060. http://openaccess.thecvf.com/content_cvpr_2018_workshops/w9/html/DeTone_SuperPoint_Self-Supervised_Interest_CVPR_2018_paper.html
Acknowledgements
This work is supported by a grant from the Social and Science Foundation of Liaoning Province (No. L20BTQ008), in part by the National Natural Science Foundation of China under Grant 61976124 and in part by the Scientific Research Fund of Yunnan Provincial Education Department under Grant 2021J0007.
Author information
Authors and Affiliations
Contributions
Yide Di: Conceptualization and Writing; Yun Liao: Methodology; Hao Zhou: Software; Kaijun Zhu: Validation; Yijia Zhang: Formal analysis; Qing Duan: Investigation; Junhui Liu: Data Curation; Mingyu Lu: Supervision.
Corresponding author
Ethics declarations
Ethical and informed consent for data used
This submission does not include human or animal research.
Competing Interests
The authors declare that they have no conflicts of interests or competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Di, Y., Liao, Y., Zhou, H. et al. FeMIP: detector-free feature matching for multimodal images with policy gradient. Appl Intell 53, 24068–24088 (2023). https://doi.org/10.1007/s10489-023-04659-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-04659-5