Abstract
In unsupervised representational learning, self-supervised learning has made great progress due to its combination with contrastive learning. The core idea of self-supervised learning is to make positive sample pairs closer and negative sample pairs far away. However, the disadvantage of such methods, such as MoCo, is that some of the positive samples can be misclassified as negative samples, which affects the learning ability of the model. To address this problem, we propose two momentum contrast in triplet (TMCT) for unsupervised visual representation learning. The method maps the obtained representations to another space and employs the samples of the third network as the target for the final learning of the model. Experimental results on three benchmark datasets demonstrate the effectiveness of our proposed method. TMCT obtains classification accuracy of 84.50\(\%\) on CIFAR10, which is 2.47\(\%\) higher than SimCLR.
Similar content being viewed by others
Data Availability
Data openly available in a public repository. The datasets analysed during the current study are available, in the CIFAR10 at http://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz, in the CIFAR100 at http://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz, in the TinyImageNet at https://tiny-imagenet.herokuapp.com.
References
Bromley J, Guyon I, LeCun Y, Sackinger E, Shah R (1993) Signature verification using a “siamese” time delay neural network. In: Advances in neural information processing systems. pp 737–744
Cai T, Frankle J, Schwab DJ, Morcos AS (2021) Are all negatives created equal in contrastive instance discrimination? arXiv:2010.06682
Caron M, Bojanowski P, Joulin A, Douze M (2018) Deep clustering for unsupervised learning of visual features. In: European conference on computer vision. pp 1–30
Caron M, Misra I, Goyal P, Bojanowski P, Joulin A (2020) Unsupervised learning of visual features by contrasting cluster assignments. In: Advances in neural information processing systems. pp 1–23
Chen T, Kornblith S, Norouzi M, Hinton G (2020) A simple framework for contrastive learning of visual representations. In: International conference on machine learning. pp 1597–1607
Chen X, Fan H, Girshick R, He K (2020) Improved baselines with momentum contrastive learning. arXiv:2003.04297
Chen X, He K (2021) Exploring simple siamese representation learning. In: IEEE conference on computer vision and pattern recognition. pp 15745–15753
Doersch C, Gupta A, Efros AA (2015) Unsupervised visual representation learning by context prediction. In: IEEE international conference on computer vision. pp 1422–1430
Gidaris S, Singh P, Komodakis N (2018) Unsupervised representation learning by predicting image rotations. In: International conference on learning representations. pp 1–16
Grill J-B, Strub F, Altche F, Corentin Tallec EA (2020) Bootstrap your own latent: A new approach to self-supervised learning. In: Advances in neural information processing systems. pp 21271–21284
He K, Fan H, Wu Y, Xie S, Girshick R (2020) Momentum contrast for unsupervised visual representation learning. In: IEEE conference on computer vision and pattern recognition. pp 9726–9735
Hinton GE, Nair V (2010) Rectified linear units improve restricted Boltzmann machines. In: International conference on machine learning. pp 1–8
Jaiswal A, Babu AR, Zadeh MZ, Banerjee D, Makedon F (2021) A survey on contrastive self-supervised learning arXiv:2011.00362
Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. pp 1–60
Noroozi M, Favaro P (2016) Unsupervised learning of visual representations by solving jigsaw puzzles. In: European conference on computer vision, pp 69–84
Pathak D, Krahenbuhl P, Donahue J, Darrel T, Efros AA (2016) Context encoders: Feature learning by inpainting. In: IEEE conference on computer vision and pattern recognition. pp 2536–2544
Pouransari H, Ghili S (2015) Tiny imagenet visual recognition challenge
Ren S, Sun J, He K, Zhang X (2016) Deep residual learning for image recognition. In: IEEE conference on computer vision and pattern recognition. pp 770–778
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vision 115(3):211–252
Tian Y, Sun C, Poole B, Krishnan D, Schmid C, Isola P (2020) What makes for good views for contrastive learning. In: Advances in neural information processing systems. pp 1–24
Wang X, Zhang R, Shen C, Li L (2021) Dense contrastive learning for self-supervised visual pre-training. In: IEEE conference on computer vision and pattern recognition. pp 3024–3033
Wu Z, Xiong Y, Yu SX, Lin D (2018) Unsupervised feature learning via non-parametric instance discrimination. In: IEEE conference on computer vision and pattern recognition. pp 3733–3742
Zhang R, Isola P, Efros AA (2016) Colorful image colorization. In: European conference on computer vision. pp 649–666
Acknowledgements
This work is supported by the National Nature Science Foundation of China (Grant Number 619006098).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Long, X., Du, H. & Li, Y. Two momentum contrast in triplet for unsupervised visual representation learning. Multimed Tools Appl 83, 10467–10480 (2024). https://doi.org/10.1007/s11042-023-15998-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-15998-3