Abstract
The effectiveness of metric-based few-shot learning methods heavily relies on the discriminative ability of the prototypes and feature embeddings of queries. However, using instance-level unimodal prototypes often falls short in capturing the essence of various categories. To this end, we propose a multimodal variational contrastive learning framework that aims to enhance prototype representativeness and refine the discrimination of query features by acquiring distribution-level representations. Our approach starts by training a variational auto-encoder through supervised contrastive learning in both the visual and semantic spaces. The trained model is employed to augment the support set by sampling features from the learned semantic distributions and generate pseudo-semantics for queries to achieve information balance across samples in both the support and query sets. Furthermore, we establish a multimodal instance-to-distribution model that learns to transform instance-level multimodal features into distribution-level representations via variational inference, facilitating robust metric. Experiments show that our MVC consistently brings between 0.5\(\%\) and 7\(\%\) improvement in accuracy over state-of-the-art methods on standard few-shot learning datasets like miniImageNet, CIFAR-FS, tieredImageNet, and CUB, demonstrating the superiority of our method in terms of classification performance and robustness. The source code is available at: https://github.com/pmhDL/MVC.git.
Similar content being viewed by others
Data availability and access
The datasets during the current study are available in websites https://paperswithcode.com/datasets?task=few-shot-image-classification.
Code Availibility
The code is available at: https://github.com/pmhDL/MVC.git.
References
Wang Y, Yao Q, Kwok JT, Ni LM (2020) Generalizing from a few examples: a survey on few-shot learning. ACM Comput Surv (csur) 53(3):1–34
Vilalta R, Drissi Y (2002) A perspective view and survey of meta-learning. Artif Intell Rev 18:77–95
Lake BM, Salakhutdinov RR, Tenenbaum J (2013) One-shot learning by inverting a compositional causal process. Adv Neural Inf Process Syst 26
Wang Y-X, Girshick R, Hebert M, Hariharan B (2018) Low-shot learning from imaginary data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7278–7286
Fei-Fei L, Fergus R, Perona P (2006) One-shot learning of object categories. IEEE Trans Pattern Anal Mach Intell 28(4):594–611
Upadhyay S, Faruqui M, Tür G, Dilek H-T, Heck L (2018) Zero-shot cross-lingual spoken language understanding. In: 2018 IEEE International conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 6034–6038
Wang Y, Abuduweili A, Yao Q, Dou D (2021) Property-aware relation networks for few-shot molecular property prediction. Adv Neural Inf Process Syst 34:17441–17454
Altae-Tran H, Ramsundar B, Pappu AS, Pande V (2017) Low data drug discovery with one-shot learning. ACS Central Sci 3(4):283–293
Wang W, Xia Q, Hu Z, Yan Z, Li Z, Wu Y, Huang N, Gao Y, Metaxas D, Zhang S (2021) Few-shot learning by a cascaded framework with shape-constrained pseudo label assessment for whole heart segmentation. IEEE Trans Med Imaging 40(10):2629–2641
Majee A, Agrawal K, Subramanian A (2021) Few-shot learning for road object detection. In: AAAI Workshop on meta-learning and MetaDL challenge, PMLR, pp 115–126
Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543
Kenton JDM-WC, Toutanova LK (2019) Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT, pp 4171–4186
Radford A, Kim JW, Hallacy C, Ramesh A, Goh G, Agarwal S, Sastry G, Askell A, Mishkin P, Clark J et al (2021) Learning transferable visual models from natural language supervision. In: International conference on machine learning, PMLR, pp 8748–8763
Xing C, Rostamzadeh N, Oreshkin B, O Pinheiro PO (2019) Adaptive cross-modal few-shot learning. Adv Neural Inf Process Syst 32
Zhang B, Li X, Ye Y, Huang Z, Zhang L (2021) Prototype completion with primitive knowledge for few-shot learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3754–3762
Ji Z, Hou Z, Liu X, Pang Y, Han J (2022) Information symmetry matters: a modal-alternating propagation network for few-shot learning. IEEE Trans Image Process 31:1520–1531
Li W, Xu J, Huo J, Wang L, Gao Y, Luo J (2019) Distribution consistency based covariance metric networks for few-shot learning. Proceedings of the AAAI conference on artificial intelligence 33:8642–8649
Zhang J, Zhao C, Ni B, Xu M, Yang X (2019) Variational few-shot learning. In: Proceedings of the IEEE/CVF International conference on computer vision, pp 1685–1694
Kim J, Oh T-H, Lee S, Pan F, Kweon IS (2019) Variational prototyping-encoder: one-shot learning with prototypical images. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9462–9470
Xu J, Le H, Huang M, Athar S, Samaras D (2021) Variational feature disentangling for fine-grained few-shot classification. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 8812–8821
Li W, Wang L, Huo J, Shi Y, Gao Y, Luo J (2021) Asymmetric distribution measure for few-shot learning. In: Proceedings of the Twenty-Ninth international conference on international joint conferences on artificial intelligence, pp 2957–2963
Snell J, Swersky K, Zemel R (2017) Prototypical networks for few-shot learning. Adv Neural Inf Process Syst 30 (2017)
Vinyals O, Blundell C, Lillicrap T, Wierstra D et al. (2016) Matching networks for one shot learning. Adv Neural Inf Process Syst 29
Sung F, Yang Y, Zhang L, Xiang T, Torr PH, Hospedales TM (2018) Learning to compare: relation network for few-shot learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1199–1208
Zeng Q, Geng J (2022) Task-specific contrastive learning for few-shot remote sensing image scene classification. ISPRS J Photogrammetry Remote Sens 191:143–154
Zhao W, Song K, Wang Y, Liang S, Yan Y (2023) Fanet: feature-aware network for few shot classification of strip steel surface defects. Measurement 208:112446
Ji Z, Chai X, Yu Y, Pang Y, Zhang Z (2020) Improved prototypical networks for few-shot learning. Pattern Recognit Lett 140:81–87
Huang H, Wu Z, Li W, Huo J, Gao Y (2021) Local descriptor-based multi-prototype network for few-shot learning. Pattern Recognit 116:107935
Zhan Z, Zhou J, Xu B (2022) Fabric defect classification using prototypical network of few-shot learning algorithm. Comput Ind 138:103628
Sun Q, Liu Y, Chua T-S, Schiele B (2019) Meta-transfer learning for few-shot learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 403–412
Rusu AA, Rao D, Sygnowski J, Vinyals O, Pascanu R, Osindero S, Hadsell R (2018) Meta-learning with latent embedding optimization. In: International conference on learning representations
Lee K, Maji S, Ravichandran A, Soatto S (2019) Meta-learning with differentiable convex optimization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10657–10665
Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In: International conference on machine learning, PMLR, pp 1126–1135
Jia J, Feng X, Yu H (2024) Few-shot classification via efficient meta-learning with hybrid optimization. Eng Appl Artif Intell 127:107296
Hariharan B, Girshick R (2017) Low-shot visual recognition by shrinking and hallucinating features. In: Proceedings of the IEEE international conference on computer vision, pp 3018–3027
Li K, Zhang Y, Li K, Fu Y (2020) Adversarial feature hallucination networks for few-shot learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13470–13479
Yang S, Liu L, Xu M (2021) Free lunch for few-shot learning: distribution calibration. In: International conference on learning representations
Guo D, Tian L, Zhao H, Zhou M, Zha H (2022) Adaptive distribution calibration for few-shot learning with hierarchical optimal transport. In: Advances in neural information processing systems
Yang L, Li L, Zhang Z, Zhou X, Zhou E, Liu Y (2020) Dpgn: distribution propagation graph network for few-shot learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13390–13399
Kingma DP, Welling M (2014) Auto-encoding variational bayes. stat 1050:1
Ren M, Triantafillou E, Ravi S, Snell J, Swersky K, Tenenbaum JB, Larochelle H, Zemel RS (2018) Meta-learning for semi-supervised few-shot classification. In: International conference on learning representations
Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The caltech-ucsd birds-200-2011 dataset
Yang F, Wang R, Chen X (2023) Semantic guided latent parts embedding for few-shot learning. In: Proceedings of the IEEE/CVF Winter conference on applications of computer vision, pp 5447–5457
Peng Z, Li Z, Zhang J, Li Y, Qi G-J, Tang J (2019) Few-shot image recognition with knowledge transfer. In: Proceedings of the IEEE/CVF International conference on computer vision, pp 441–449
Schwartz E, Karlinsky L, Feris R, Giryes R, Bronstein A (2022) Baby steps towards few-shot learning with multiple semantics. Pattern Recognit Lett 160:142–147
Chen Z, Fu Y, Zhang Y, Jiang Y-G, Xue X, Sigal L (2019) Multi-level semantic feature augmentation for one-shot learning. IEEE Trans Image Process 28(9):4594–4605
Oreshkin B, Rodríguez López P, Lacoste A (2018) Tadam: task dependent adaptive metric for improved few-shot learning. Adv Neural Inf Process Syst 31
Tian Y, Wang Y, Krishnan D, Tenenbaum JB, Isola P (2020) Rethinking few-shot image classification: a good embedding is all you need? In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIV 16, Springer, pp 266–282
Zhang C, Cai Y, Lin G, Shen C (2020) Deepemd: few-shot image classification with differentiable earth mover’s distance and structured classifiers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12203–12213
Wertheimer D, Tang L, Hariharan B (2021) Few-shot classification with feature map reconstruction networks. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp 8012–8021
Kang D, Kwon H, Min J, Cho M (2021) Relational embedding for few-shot classification. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 8822–8833
Rizve MN, Khan S, Khan FS, Shah M (2021) Exploring complementary strengths of invariant and equivariant representations for few-shot learning. In: 2021 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), IEEE Computer Society, pp 10831–10841
Liu Y, Zhang W, Xiang C, Zheng T, Cai D, He X (2022) Learning to affiliate: mutual centralized learning for few-shot classification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14411–14420
Zhou Z, Qiu X, Xie J, Wu J, Zhang C (2021) Binocular mutual learning for improving few-shot classification. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 8402–8411
Li A, Huang W, Lan X, Feng J, Li Z, Wang L (2020) Boosting few-shot learning with adaptive margin loss. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12576–12584
Ye H-J, Hu H, Zhan D-C, Sha F (2020) Few-shot learning via embedding adaptation with set-to-set functions. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp 8808–8817
Zhang X, Meng D, Gouk H, Hospedales TM (2021) Shallow bayesian meta learning for real-world few-shot recognition. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 651–660
Qi G, Yu H, Lu Z, Li S (2021) Transductive few-shot classification on the oblique manifold. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 8412–8422
Joulin A, Grave É, Bojanowski P, Mikolov T (2017) Bag of tricks for efficient text classification. In: Proceedings of the 15th Conference of the European chapter of the association for computational linguistics: vol 2, Short Papers, pp 427–431
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. Adv Neural Inf Process Syst 26
Acknowledgements
This work was supported by the National Natural Science Foundation of China (62073219) and Science and Technology Commission of Shanghai Municipality (22511104100).
Funding
This work was supported by the National Natural Science Foundation of China (62073219) and Science and Technology Commission of Shanghai Municipality (22511104100).
Author information
Authors and Affiliations
Contributions
Hongbin Shen contributed to the study conception and design. Material preparation, data collection, coding and analysis were performed by Meihong Pan. The first draft of the manuscript was written by Meihong Pan and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript
Corresponding author
Ethics declarations
Competing Interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Ethical and informed consent for data used
All the data is used for research only and will not be used for any other purpose. We use and process data based on the principles of Transparency, Innovation, Respect and Security.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Pan, M., Shen, H. Multimodal variational contrastive learning for few-shot classification. Appl Intell 54, 1879–1892 (2024). https://doi.org/10.1007/s10489-024-05269-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-024-05269-5