Skip to main content
Log in

Multimodal variational contrastive learning for few-shot classification

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

The effectiveness of metric-based few-shot learning methods heavily relies on the discriminative ability of the prototypes and feature embeddings of queries. However, using instance-level unimodal prototypes often falls short in capturing the essence of various categories. To this end, we propose a multimodal variational contrastive learning framework that aims to enhance prototype representativeness and refine the discrimination of query features by acquiring distribution-level representations. Our approach starts by training a variational auto-encoder through supervised contrastive learning in both the visual and semantic spaces. The trained model is employed to augment the support set by sampling features from the learned semantic distributions and generate pseudo-semantics for queries to achieve information balance across samples in both the support and query sets. Furthermore, we establish a multimodal instance-to-distribution model that learns to transform instance-level multimodal features into distribution-level representations via variational inference, facilitating robust metric. Experiments show that our MVC consistently brings between 0.5\(\%\) and 7\(\%\) improvement in accuracy over state-of-the-art methods on standard few-shot learning datasets like miniImageNet, CIFAR-FS, tieredImageNet, and CUB, demonstrating the superiority of our method in terms of classification performance and robustness. The source code is available at: https://github.com/pmhDL/MVC.git.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data availability and access

The datasets during the current study are available in websites https://paperswithcode.com/datasets?task=few-shot-image-classification.

Code Availibility

The code is available at: https://github.com/pmhDL/MVC.git.

References

  1. Wang Y, Yao Q, Kwok JT, Ni LM (2020) Generalizing from a few examples: a survey on few-shot learning. ACM Comput Surv (csur) 53(3):1–34

    Article  Google Scholar 

  2. Vilalta R, Drissi Y (2002) A perspective view and survey of meta-learning. Artif Intell Rev 18:77–95

    Article  Google Scholar 

  3. Lake BM, Salakhutdinov RR, Tenenbaum J (2013) One-shot learning by inverting a compositional causal process. Adv Neural Inf Process Syst 26

  4. Wang Y-X, Girshick R, Hebert M, Hariharan B (2018) Low-shot learning from imaginary data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7278–7286

  5. Fei-Fei L, Fergus R, Perona P (2006) One-shot learning of object categories. IEEE Trans Pattern Anal Mach Intell 28(4):594–611

    Article  PubMed  Google Scholar 

  6. Upadhyay S, Faruqui M, Tür G, Dilek H-T, Heck L (2018) Zero-shot cross-lingual spoken language understanding. In: 2018 IEEE International conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 6034–6038

  7. Wang Y, Abuduweili A, Yao Q, Dou D (2021) Property-aware relation networks for few-shot molecular property prediction. Adv Neural Inf Process Syst 34:17441–17454

    Google Scholar 

  8. Altae-Tran H, Ramsundar B, Pappu AS, Pande V (2017) Low data drug discovery with one-shot learning. ACS Central Sci 3(4):283–293

    Article  CAS  Google Scholar 

  9. Wang W, Xia Q, Hu Z, Yan Z, Li Z, Wu Y, Huang N, Gao Y, Metaxas D, Zhang S (2021) Few-shot learning by a cascaded framework with shape-constrained pseudo label assessment for whole heart segmentation. IEEE Trans Med Imaging 40(10):2629–2641

    Article  PubMed  Google Scholar 

  10. Majee A, Agrawal K, Subramanian A (2021) Few-shot learning for road object detection. In: AAAI Workshop on meta-learning and MetaDL challenge, PMLR, pp 115–126

  11. Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543

  12. Kenton JDM-WC, Toutanova LK (2019) Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT, pp 4171–4186

  13. Radford A, Kim JW, Hallacy C, Ramesh A, Goh G, Agarwal S, Sastry G, Askell A, Mishkin P, Clark J et al (2021) Learning transferable visual models from natural language supervision. In: International conference on machine learning, PMLR, pp 8748–8763

  14. Xing C, Rostamzadeh N, Oreshkin B, O Pinheiro PO (2019) Adaptive cross-modal few-shot learning. Adv Neural Inf Process Syst 32

  15. Zhang B, Li X, Ye Y, Huang Z, Zhang L (2021) Prototype completion with primitive knowledge for few-shot learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3754–3762

  16. Ji Z, Hou Z, Liu X, Pang Y, Han J (2022) Information symmetry matters: a modal-alternating propagation network for few-shot learning. IEEE Trans Image Process 31:1520–1531

    Article  ADS  PubMed  Google Scholar 

  17. Li W, Xu J, Huo J, Wang L, Gao Y, Luo J (2019) Distribution consistency based covariance metric networks for few-shot learning. Proceedings of the AAAI conference on artificial intelligence 33:8642–8649

    Article  Google Scholar 

  18. Zhang J, Zhao C, Ni B, Xu M, Yang X (2019) Variational few-shot learning. In: Proceedings of the IEEE/CVF International conference on computer vision, pp 1685–1694

  19. Kim J, Oh T-H, Lee S, Pan F, Kweon IS (2019) Variational prototyping-encoder: one-shot learning with prototypical images. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9462–9470

  20. Xu J, Le H, Huang M, Athar S, Samaras D (2021) Variational feature disentangling for fine-grained few-shot classification. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 8812–8821

  21. Li W, Wang L, Huo J, Shi Y, Gao Y, Luo J (2021) Asymmetric distribution measure for few-shot learning. In: Proceedings of the Twenty-Ninth international conference on international joint conferences on artificial intelligence, pp 2957–2963

  22. Snell J, Swersky K, Zemel R (2017) Prototypical networks for few-shot learning. Adv Neural Inf Process Syst 30 (2017)

  23. Vinyals O, Blundell C, Lillicrap T, Wierstra D et al. (2016) Matching networks for one shot learning. Adv Neural Inf Process Syst 29

  24. Sung F, Yang Y, Zhang L, Xiang T, Torr PH, Hospedales TM (2018) Learning to compare: relation network for few-shot learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1199–1208

  25. Zeng Q, Geng J (2022) Task-specific contrastive learning for few-shot remote sensing image scene classification. ISPRS J Photogrammetry Remote Sens 191:143–154

    Article  ADS  Google Scholar 

  26. Zhao W, Song K, Wang Y, Liang S, Yan Y (2023) Fanet: feature-aware network for few shot classification of strip steel surface defects. Measurement 208:112446

    Article  Google Scholar 

  27. Ji Z, Chai X, Yu Y, Pang Y, Zhang Z (2020) Improved prototypical networks for few-shot learning. Pattern Recognit Lett 140:81–87

    Article  ADS  Google Scholar 

  28. Huang H, Wu Z, Li W, Huo J, Gao Y (2021) Local descriptor-based multi-prototype network for few-shot learning. Pattern Recognit 116:107935

    Article  Google Scholar 

  29. Zhan Z, Zhou J, Xu B (2022) Fabric defect classification using prototypical network of few-shot learning algorithm. Comput Ind 138:103628

    Article  Google Scholar 

  30. Sun Q, Liu Y, Chua T-S, Schiele B (2019) Meta-transfer learning for few-shot learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 403–412

  31. Rusu AA, Rao D, Sygnowski J, Vinyals O, Pascanu R, Osindero S, Hadsell R (2018) Meta-learning with latent embedding optimization. In: International conference on learning representations

  32. Lee K, Maji S, Ravichandran A, Soatto S (2019) Meta-learning with differentiable convex optimization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10657–10665

  33. Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In: International conference on machine learning, PMLR, pp 1126–1135

  34. Jia J, Feng X, Yu H (2024) Few-shot classification via efficient meta-learning with hybrid optimization. Eng Appl Artif Intell 127:107296

    Article  Google Scholar 

  35. Hariharan B, Girshick R (2017) Low-shot visual recognition by shrinking and hallucinating features. In: Proceedings of the IEEE international conference on computer vision, pp 3018–3027

  36. Li K, Zhang Y, Li K, Fu Y (2020) Adversarial feature hallucination networks for few-shot learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13470–13479

  37. Yang S, Liu L, Xu M (2021) Free lunch for few-shot learning: distribution calibration. In: International conference on learning representations

  38. Guo D, Tian L, Zhao H, Zhou M, Zha H (2022) Adaptive distribution calibration for few-shot learning with hierarchical optimal transport. In: Advances in neural information processing systems

  39. Yang L, Li L, Zhang Z, Zhou X, Zhou E, Liu Y (2020) Dpgn: distribution propagation graph network for few-shot learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13390–13399

  40. Kingma DP, Welling M (2014) Auto-encoding variational bayes. stat 1050:1

    Google Scholar 

  41. Ren M, Triantafillou E, Ravi S, Snell J, Swersky K, Tenenbaum JB, Larochelle H, Zemel RS (2018) Meta-learning for semi-supervised few-shot classification. In: International conference on learning representations

  42. Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The caltech-ucsd birds-200-2011 dataset

  43. Yang F, Wang R, Chen X (2023) Semantic guided latent parts embedding for few-shot learning. In: Proceedings of the IEEE/CVF Winter conference on applications of computer vision, pp 5447–5457

  44. Peng Z, Li Z, Zhang J, Li Y, Qi G-J, Tang J (2019) Few-shot image recognition with knowledge transfer. In: Proceedings of the IEEE/CVF International conference on computer vision, pp 441–449

  45. Schwartz E, Karlinsky L, Feris R, Giryes R, Bronstein A (2022) Baby steps towards few-shot learning with multiple semantics. Pattern Recognit Lett 160:142–147

    Article  ADS  Google Scholar 

  46. Chen Z, Fu Y, Zhang Y, Jiang Y-G, Xue X, Sigal L (2019) Multi-level semantic feature augmentation for one-shot learning. IEEE Trans Image Process 28(9):4594–4605

    Article  ADS  MathSciNet  Google Scholar 

  47. Oreshkin B, Rodríguez López P, Lacoste A (2018) Tadam: task dependent adaptive metric for improved few-shot learning. Adv Neural Inf Process Syst 31

  48. Tian Y, Wang Y, Krishnan D, Tenenbaum JB, Isola P (2020) Rethinking few-shot image classification: a good embedding is all you need? In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIV 16, Springer, pp 266–282

  49. Zhang C, Cai Y, Lin G, Shen C (2020) Deepemd: few-shot image classification with differentiable earth mover’s distance and structured classifiers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12203–12213

  50. Wertheimer D, Tang L, Hariharan B (2021) Few-shot classification with feature map reconstruction networks. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp 8012–8021

  51. Kang D, Kwon H, Min J, Cho M (2021) Relational embedding for few-shot classification. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 8822–8833

  52. Rizve MN, Khan S, Khan FS, Shah M (2021) Exploring complementary strengths of invariant and equivariant representations for few-shot learning. In: 2021 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), IEEE Computer Society, pp 10831–10841

  53. Liu Y, Zhang W, Xiang C, Zheng T, Cai D, He X (2022) Learning to affiliate: mutual centralized learning for few-shot classification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14411–14420

  54. Zhou Z, Qiu X, Xie J, Wu J, Zhang C (2021) Binocular mutual learning for improving few-shot classification. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 8402–8411

  55. Li A, Huang W, Lan X, Feng J, Li Z, Wang L (2020) Boosting few-shot learning with adaptive margin loss. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12576–12584

  56. Ye H-J, Hu H, Zhan D-C, Sha F (2020) Few-shot learning via embedding adaptation with set-to-set functions. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp 8808–8817

  57. Zhang X, Meng D, Gouk H, Hospedales TM (2021) Shallow bayesian meta learning for real-world few-shot recognition. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 651–660

  58. Qi G, Yu H, Lu Z, Li S (2021) Transductive few-shot classification on the oblique manifold. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 8412–8422

  59. Joulin A, Grave É, Bojanowski P, Mikolov T (2017) Bag of tricks for efficient text classification. In: Proceedings of the 15th Conference of the European chapter of the association for computational linguistics: vol 2, Short Papers, pp 427–431

  60. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. Adv Neural Inf Process Syst 26

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (62073219) and Science and Technology Commission of Shanghai Municipality (22511104100).

Funding

This work was supported by the National Natural Science Foundation of China (62073219) and Science and Technology Commission of Shanghai Municipality (22511104100).

Author information

Authors and Affiliations

Authors

Contributions

Hongbin Shen contributed to the study conception and design. Material preparation, data collection, coding and analysis were performed by Meihong Pan. The first draft of the manuscript was written by Meihong Pan and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript

Corresponding author

Correspondence to Hongbin Shen.

Ethics declarations

Competing Interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethical and informed consent for data used

All the data is used for research only and will not be used for any other purpose. We use and process data based on the principles of Transparency, Innovation, Respect and Security.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pan, M., Shen, H. Multimodal variational contrastive learning for few-shot classification. Appl Intell 54, 1879–1892 (2024). https://doi.org/10.1007/s10489-024-05269-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-024-05269-5

Keywords

Navigation