Skip to main content
Log in

Two momentum contrast in triplet for unsupervised visual representation learning

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In unsupervised representational learning, self-supervised learning has made great progress due to its combination with contrastive learning. The core idea of self-supervised learning is to make positive sample pairs closer and negative sample pairs far away. However, the disadvantage of such methods, such as MoCo, is that some of the positive samples can be misclassified as negative samples, which affects the learning ability of the model. To address this problem, we propose two momentum contrast in triplet (TMCT) for unsupervised visual representation learning. The method maps the obtained representations to another space and employs the samples of the third network as the target for the final learning of the model. Experimental results on three benchmark datasets demonstrate the effectiveness of our proposed method. TMCT obtains classification accuracy of 84.50\(\%\) on CIFAR10, which is 2.47\(\%\) higher than SimCLR.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data Availability

Data openly available in a public repository. The datasets analysed during the current study are available, in the CIFAR10 at http://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz, in the CIFAR100 at http://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz, in the TinyImageNet at https://tiny-imagenet.herokuapp.com.

References

  1. Bromley J, Guyon I, LeCun Y, Sackinger E, Shah R (1993) Signature verification using a “siamese” time delay neural network. In: Advances in neural information processing systems. pp 737–744

  2. Cai T, Frankle J, Schwab DJ, Morcos AS (2021) Are all negatives created equal in contrastive instance discrimination? arXiv:2010.06682

  3. Caron M, Bojanowski P, Joulin A, Douze M (2018) Deep clustering for unsupervised learning of visual features. In: European conference on computer vision. pp 1–30

  4. Caron M, Misra I, Goyal P, Bojanowski P, Joulin A (2020) Unsupervised learning of visual features by contrasting cluster assignments. In: Advances in neural information processing systems. pp 1–23

  5. Chen T, Kornblith S, Norouzi M, Hinton G (2020) A simple framework for contrastive learning of visual representations. In: International conference on machine learning. pp 1597–1607

  6. Chen X, Fan H, Girshick R, He K (2020) Improved baselines with momentum contrastive learning. arXiv:2003.04297

  7. Chen X, He K (2021) Exploring simple siamese representation learning. In: IEEE conference on computer vision and pattern recognition. pp 15745–15753

  8. Doersch C, Gupta A, Efros AA (2015) Unsupervised visual representation learning by context prediction. In: IEEE international conference on computer vision. pp 1422–1430

  9. Gidaris S, Singh P, Komodakis N (2018) Unsupervised representation learning by predicting image rotations. In: International conference on learning representations. pp 1–16

  10. Grill J-B, Strub F, Altche F, Corentin Tallec EA (2020) Bootstrap your own latent: A new approach to self-supervised learning. In: Advances in neural information processing systems. pp 21271–21284

  11. He K, Fan H, Wu Y, Xie S, Girshick R (2020) Momentum contrast for unsupervised visual representation learning. In: IEEE conference on computer vision and pattern recognition. pp 9726–9735

  12. Hinton GE, Nair V (2010) Rectified linear units improve restricted Boltzmann machines. In: International conference on machine learning. pp 1–8

  13. Jaiswal A, Babu AR, Zadeh MZ, Banerjee D, Makedon F (2021) A survey on contrastive self-supervised learning arXiv:2011.00362

  14. Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. pp 1–60

  15. Noroozi M, Favaro P (2016) Unsupervised learning of visual representations by solving jigsaw puzzles. In: European conference on computer vision, pp 69–84

  16. Pathak D, Krahenbuhl P, Donahue J, Darrel T, Efros AA (2016) Context encoders: Feature learning by inpainting. In: IEEE conference on computer vision and pattern recognition. pp 2536–2544

  17. Pouransari H, Ghili S (2015) Tiny imagenet visual recognition challenge

  18. Ren S, Sun J, He K, Zhang X (2016) Deep residual learning for image recognition. In: IEEE conference on computer vision and pattern recognition. pp 770–778

  19. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vision 115(3):211–252

    Article  MathSciNet  Google Scholar 

  20. Tian Y, Sun C, Poole B, Krishnan D, Schmid C, Isola P (2020) What makes for good views for contrastive learning. In: Advances in neural information processing systems. pp 1–24

  21. Wang X, Zhang R, Shen C, Li L (2021) Dense contrastive learning for self-supervised visual pre-training. In: IEEE conference on computer vision and pattern recognition. pp 3024–3033

  22. Wu Z, Xiong Y, Yu SX, Lin D (2018) Unsupervised feature learning via non-parametric instance discrimination. In: IEEE conference on computer vision and pattern recognition. pp 3733–3742

  23. Zhang R, Isola P, Efros AA (2016) Colorful image colorization. In: European conference on computer vision. pp 649–666

Download references

Acknowledgements

This work is supported by the National Nature Science Foundation of China (Grant Number 619006098).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xianzhong Long.

Ethics declarations

Conflicts of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Long, X., Du, H. & Li, Y. Two momentum contrast in triplet for unsupervised visual representation learning. Multimed Tools Appl 83, 10467–10480 (2024). https://doi.org/10.1007/s11042-023-15998-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-15998-3

Keywords

Navigation