Multimodal variational contrastive learning for few-shot classification

Pan, Meihong; Shen, Hongbin

doi:10.1007/s10489-024-05269-5

Multimodal variational contrastive learning for few-shot classification

Published: 24 January 2024

Volume 54, pages 1879–1892, (2024)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

391 Accesses
1 Altmetric
Explore all metrics

Abstract

The effectiveness of metric-based few-shot learning methods heavily relies on the discriminative ability of the prototypes and feature embeddings of queries. However, using instance-level unimodal prototypes often falls short in capturing the essence of various categories. To this end, we propose a multimodal variational contrastive learning framework that aims to enhance prototype representativeness and refine the discrimination of query features by acquiring distribution-level representations. Our approach starts by training a variational auto-encoder through supervised contrastive learning in both the visual and semantic spaces. The trained model is employed to augment the support set by sampling features from the learned semantic distributions and generate pseudo-semantics for queries to achieve information balance across samples in both the support and query sets. Furthermore, we establish a multimodal instance-to-distribution model that learns to transform instance-level multimodal features into distribution-level representations via variational inference, facilitating robust metric. Experiments show that our MVC consistently brings between 0.5\(\%\) and 7\(\%\) improvement in accuracy over state-of-the-art methods on standard few-shot learning datasets like miniImageNet, CIFAR-FS, tieredImageNet, and CUB, demonstrating the superiority of our method in terms of classification performance and robustness. The source code is available at: https://github.com/pmhDL/MVC.git.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Few-Shot Classification with Semantic Augmented Activators

Embedded adaptive cross-modulation neural network for few-shot learning

Article 16 November 2019

Enhancing Robustness of Prototype with Attentive Information Guided Alignment in Few-Shot Classification

Data availability and access

The datasets during the current study are available in websites https://paperswithcode.com/datasets?task=few-shot-image-classification.

Code Availibility

The code is available at: https://github.com/pmhDL/MVC.git.

References

Wang Y, Yao Q, Kwok JT, Ni LM (2020) Generalizing from a few examples: a survey on few-shot learning. ACM Comput Surv (csur) 53(3):1–34
Article Google Scholar
Vilalta R, Drissi Y (2002) A perspective view and survey of meta-learning. Artif Intell Rev 18:77–95
Article Google Scholar
Lake BM, Salakhutdinov RR, Tenenbaum J (2013) One-shot learning by inverting a compositional causal process. Adv Neural Inf Process Syst 26
Wang Y-X, Girshick R, Hebert M, Hariharan B (2018) Low-shot learning from imaginary data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7278–7286
Fei-Fei L, Fergus R, Perona P (2006) One-shot learning of object categories. IEEE Trans Pattern Anal Mach Intell 28(4):594–611
Article PubMed Google Scholar
Upadhyay S, Faruqui M, Tür G, Dilek H-T, Heck L (2018) Zero-shot cross-lingual spoken language understanding. In: 2018 IEEE International conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 6034–6038
Wang Y, Abuduweili A, Yao Q, Dou D (2021) Property-aware relation networks for few-shot molecular property prediction. Adv Neural Inf Process Syst 34:17441–17454
Google Scholar
Altae-Tran H, Ramsundar B, Pappu AS, Pande V (2017) Low data drug discovery with one-shot learning. ACS Central Sci 3(4):283–293
Article CAS Google Scholar
Wang W, Xia Q, Hu Z, Yan Z, Li Z, Wu Y, Huang N, Gao Y, Metaxas D, Zhang S (2021) Few-shot learning by a cascaded framework with shape-constrained pseudo label assessment for whole heart segmentation. IEEE Trans Med Imaging 40(10):2629–2641
Article PubMed Google Scholar
Majee A, Agrawal K, Subramanian A (2021) Few-shot learning for road object detection. In: AAAI Workshop on meta-learning and MetaDL challenge, PMLR, pp 115–126
Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543
Kenton JDM-WC, Toutanova LK (2019) Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT, pp 4171–4186
Radford A, Kim JW, Hallacy C, Ramesh A, Goh G, Agarwal S, Sastry G, Askell A, Mishkin P, Clark J et al (2021) Learning transferable visual models from natural language supervision. In: International conference on machine learning, PMLR, pp 8748–8763
Xing C, Rostamzadeh N, Oreshkin B, O Pinheiro PO (2019) Adaptive cross-modal few-shot learning. Adv Neural Inf Process Syst 32
Zhang B, Li X, Ye Y, Huang Z, Zhang L (2021) Prototype completion with primitive knowledge for few-shot learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3754–3762
Ji Z, Hou Z, Liu X, Pang Y, Han J (2022) Information symmetry matters: a modal-alternating propagation network for few-shot learning. IEEE Trans Image Process 31:1520–1531
Article ADS PubMed Google Scholar
Li W, Xu J, Huo J, Wang L, Gao Y, Luo J (2019) Distribution consistency based covariance metric networks for few-shot learning. Proceedings of the AAAI conference on artificial intelligence 33:8642–8649
Article Google Scholar
Zhang J, Zhao C, Ni B, Xu M, Yang X (2019) Variational few-shot learning. In: Proceedings of the IEEE/CVF International conference on computer vision, pp 1685–1694
Kim J, Oh T-H, Lee S, Pan F, Kweon IS (2019) Variational prototyping-encoder: one-shot learning with prototypical images. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9462–9470
Xu J, Le H, Huang M, Athar S, Samaras D (2021) Variational feature disentangling for fine-grained few-shot classification. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 8812–8821
Li W, Wang L, Huo J, Shi Y, Gao Y, Luo J (2021) Asymmetric distribution measure for few-shot learning. In: Proceedings of the Twenty-Ninth international conference on international joint conferences on artificial intelligence, pp 2957–2963
Snell J, Swersky K, Zemel R (2017) Prototypical networks for few-shot learning. Adv Neural Inf Process Syst 30 (2017)
Vinyals O, Blundell C, Lillicrap T, Wierstra D et al. (2016) Matching networks for one shot learning. Adv Neural Inf Process Syst 29
Sung F, Yang Y, Zhang L, Xiang T, Torr PH, Hospedales TM (2018) Learning to compare: relation network for few-shot learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1199–1208
Zeng Q, Geng J (2022) Task-specific contrastive learning for few-shot remote sensing image scene classification. ISPRS J Photogrammetry Remote Sens 191:143–154
Article ADS Google Scholar
Zhao W, Song K, Wang Y, Liang S, Yan Y (2023) Fanet: feature-aware network for few shot classification of strip steel surface defects. Measurement 208:112446
Article Google Scholar
Ji Z, Chai X, Yu Y, Pang Y, Zhang Z (2020) Improved prototypical networks for few-shot learning. Pattern Recognit Lett 140:81–87
Article ADS Google Scholar
Huang H, Wu Z, Li W, Huo J, Gao Y (2021) Local descriptor-based multi-prototype network for few-shot learning. Pattern Recognit 116:107935
Article Google Scholar
Zhan Z, Zhou J, Xu B (2022) Fabric defect classification using prototypical network of few-shot learning algorithm. Comput Ind 138:103628
Article Google Scholar
Sun Q, Liu Y, Chua T-S, Schiele B (2019) Meta-transfer learning for few-shot learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 403–412
Rusu AA, Rao D, Sygnowski J, Vinyals O, Pascanu R, Osindero S, Hadsell R (2018) Meta-learning with latent embedding optimization. In: International conference on learning representations
Lee K, Maji S, Ravichandran A, Soatto S (2019) Meta-learning with differentiable convex optimization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10657–10665
Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In: International conference on machine learning, PMLR, pp 1126–1135
Jia J, Feng X, Yu H (2024) Few-shot classification via efficient meta-learning with hybrid optimization. Eng Appl Artif Intell 127:107296
Article Google Scholar
Hariharan B, Girshick R (2017) Low-shot visual recognition by shrinking and hallucinating features. In: Proceedings of the IEEE international conference on computer vision, pp 3018–3027
Li K, Zhang Y, Li K, Fu Y (2020) Adversarial feature hallucination networks for few-shot learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13470–13479
Yang S, Liu L, Xu M (2021) Free lunch for few-shot learning: distribution calibration. In: International conference on learning representations
Guo D, Tian L, Zhao H, Zhou M, Zha H (2022) Adaptive distribution calibration for few-shot learning with hierarchical optimal transport. In: Advances in neural information processing systems
Yang L, Li L, Zhang Z, Zhou X, Zhou E, Liu Y (2020) Dpgn: distribution propagation graph network for few-shot learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13390–13399
Kingma DP, Welling M (2014) Auto-encoding variational bayes. stat 1050:1
Google Scholar
Ren M, Triantafillou E, Ravi S, Snell J, Swersky K, Tenenbaum JB, Larochelle H, Zemel RS (2018) Meta-learning for semi-supervised few-shot classification. In: International conference on learning representations
Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The caltech-ucsd birds-200-2011 dataset
Yang F, Wang R, Chen X (2023) Semantic guided latent parts embedding for few-shot learning. In: Proceedings of the IEEE/CVF Winter conference on applications of computer vision, pp 5447–5457
Peng Z, Li Z, Zhang J, Li Y, Qi G-J, Tang J (2019) Few-shot image recognition with knowledge transfer. In: Proceedings of the IEEE/CVF International conference on computer vision, pp 441–449
Schwartz E, Karlinsky L, Feris R, Giryes R, Bronstein A (2022) Baby steps towards few-shot learning with multiple semantics. Pattern Recognit Lett 160:142–147
Article ADS Google Scholar
Chen Z, Fu Y, Zhang Y, Jiang Y-G, Xue X, Sigal L (2019) Multi-level semantic feature augmentation for one-shot learning. IEEE Trans Image Process 28(9):4594–4605
Article ADS MathSciNet Google Scholar
Oreshkin B, Rodríguez López P, Lacoste A (2018) Tadam: task dependent adaptive metric for improved few-shot learning. Adv Neural Inf Process Syst 31
Tian Y, Wang Y, Krishnan D, Tenenbaum JB, Isola P (2020) Rethinking few-shot image classification: a good embedding is all you need? In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIV 16, Springer, pp 266–282
Zhang C, Cai Y, Lin G, Shen C (2020) Deepemd: few-shot image classification with differentiable earth mover’s distance and structured classifiers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12203–12213
Wertheimer D, Tang L, Hariharan B (2021) Few-shot classification with feature map reconstruction networks. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp 8012–8021
Kang D, Kwon H, Min J, Cho M (2021) Relational embedding for few-shot classification. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 8822–8833
Rizve MN, Khan S, Khan FS, Shah M (2021) Exploring complementary strengths of invariant and equivariant representations for few-shot learning. In: 2021 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), IEEE Computer Society, pp 10831–10841
Liu Y, Zhang W, Xiang C, Zheng T, Cai D, He X (2022) Learning to affiliate: mutual centralized learning for few-shot classification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14411–14420
Zhou Z, Qiu X, Xie J, Wu J, Zhang C (2021) Binocular mutual learning for improving few-shot classification. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 8402–8411
Li A, Huang W, Lan X, Feng J, Li Z, Wang L (2020) Boosting few-shot learning with adaptive margin loss. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12576–12584
Ye H-J, Hu H, Zhan D-C, Sha F (2020) Few-shot learning via embedding adaptation with set-to-set functions. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp 8808–8817
Zhang X, Meng D, Gouk H, Hospedales TM (2021) Shallow bayesian meta learning for real-world few-shot recognition. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 651–660
Qi G, Yu H, Lu Z, Li S (2021) Transductive few-shot classification on the oblique manifold. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 8412–8422
Joulin A, Grave É, Bojanowski P, Mikolov T (2017) Bag of tricks for efficient text classification. In: Proceedings of the 15th Conference of the European chapter of the association for computational linguistics: vol 2, Short Papers, pp 427–431
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. Adv Neural Inf Process Syst 26

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (62073219) and Science and Technology Commission of Shanghai Municipality (22511104100).

Funding

This work was supported by the National Natural Science Foundation of China (62073219) and Science and Technology Commission of Shanghai Municipality (22511104100).

Author information

Authors and Affiliations

School of Electronic Information and Electrical Engineering, Shanghai Jiaotong University, 800 Dongchuan Road, Shanghai, 200240, China
Meihong Pan & Hongbin Shen

Authors

Meihong Pan
View author publications
You can also search for this author in PubMed Google Scholar
Hongbin Shen
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Hongbin Shen contributed to the study conception and design. Material preparation, data collection, coding and analysis were performed by Meihong Pan. The first draft of the manuscript was written by Meihong Pan and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript

Corresponding author

Correspondence to Hongbin Shen.

Ethics declarations

Competing Interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethical and informed consent for data used

All the data is used for research only and will not be used for any other purpose. We use and process data based on the principles of Transparency, Innovation, Respect and Security.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Pan, M., Shen, H. Multimodal variational contrastive learning for few-shot classification. Appl Intell 54, 1879–1892 (2024). https://doi.org/10.1007/s10489-024-05269-5

Download citation

Accepted: 02 January 2024
Published: 24 January 2024
Issue Date: January 2024
DOI: https://doi.org/10.1007/s10489-024-05269-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multimodal variational contrastive learning for few-shot classification

Abstract

Access this article

Similar content being viewed by others

Few-Shot Classification with Semantic Augmented Activators

Embedded adaptive cross-modulation neural network for few-shot learning

Enhancing Robustness of Prototype with Attentive Information Guided Alignment in Few-Shot Classification

Data availability and access

Code Availibility

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing Interests

Ethical and informed consent for data used

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multimodal variational contrastive learning for few-shot classification

Abstract

Access this article

Similar content being viewed by others

Few-Shot Classification with Semantic Augmented Activators

Embedded adaptive cross-modulation neural network for few-shot learning

Enhancing Robustness of Prototype with Attentive Information Guided Alignment in Few-Shot Classification

Data availability and access

Code Availibility

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing Interests

Ethical and informed consent for data used

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation