Towards Understanding the Mechanism of Contrastive Learning via Similarity Structure: A Theoretical Analysis

Waida, Hiroki; Wada, Yuichiro; Andéol, Léo; Nakagawa, Takumi; Zhang, Yuhui; Kanamori, Takafumi

doi:10.1007/978-3-031-43421-1_42

Hiroki Waida¹²,
Yuichiro Wada^13,14,
Léo Andéol^15,16,17,18,
Takumi Nakagawa^12,14,
Yuhui Zhang¹² &
…
Takafumi Kanamori^12,14

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14172))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

947 Accesses

Abstract

Contrastive learning is an efficient approach to self-supervised representation learning. Although recent studies have made progress in the theoretical understanding of contrastive learning, the investigation of how to characterize the clusters of the learned representations is still limited. In this paper, we aim to elucidate the characterization from theoretical perspectives. To this end, we consider a kernel-based contrastive learning framework termed Kernel Contrastive Learning (KCL), where kernel functions play an important role when applying our theoretical results to other frameworks. We introduce a formulation of the similarity structure of learned representations by utilizing a statistical dependency viewpoint. We investigate the theoretical properties of the kernel-based contrastive loss via this formulation. We first prove that the formulation characterizes the structure of representations learned with the kernel-based contrastive learning framework. We show a new upper bound of the classification error of a downstream task, which explains that our theory is consistent with the empirical success of contrastive learning. We also establish a generalization error bound of KCL. Finally, we show a guarantee for the generalization ability of KCL to the downstream classification task via a surrogate bound.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Supplementary material is available at https://github.com/hrkyd/KernelCL/tree/main/supplementary_material.
2.
Code is available at https://github.com/hrkyd/KernelCL/tree/main/code.

References

Aronszajn, N.: Theory of reproducing kernels. Trans. Am. Math. Soc. 68(3), 337–404 (1950). https://doi.org/10.1090/s0002-9947-1950-0051437-7
Article MathSciNet MATH Google Scholar
Arora, S., Khandeparkar, H., Khodak, M., Plevrakis, O., Saunshi, N.: A theoretical analysis of contrastive unsupervised representation learning. In: Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97, pp. 5628–5637. PMLR (2019)
Google Scholar
Ash, J., Goel, S., Krishnamurthy, A., Misra, D.: Investigating the role of negatives in contrastive representation learning. In: Proceedings of The 25th International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol. 151, pp. 7187–7209. PMLR (2022)
Google Scholar
Awasthi, P., Dikkala, N., Kamath, P.: Do more negative samples necessarily hurt in contrastive learning? In: Proceedings of the 39th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 162, pp. 1101–1116. PMLR (2022)
Google Scholar
Bao, H., Nagano, Y., Nozawa, K.: On the surrogate gap between contrastive and supervised losses. In: Proceedings of the 39th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 162, pp. 1585–1606. PMLR (2022)
Google Scholar
Berlinet, A., Thomas-Agnan, C.: Reproducing kernel Hilbert spaces in probability and statistics. Springer Science & Business Media (2004). https://doi.org/10.1007/978-1-4419-9096-9
Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., Joulin, A.: Unsupervised learning of visual features by contrasting cluster assignments. In: Advances in Neural Information Processing Systems. vol. 33, pp. 9912–9924. Curran Associates, Inc. (2020)
Google Scholar
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: Proceedings of the 37th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 119, pp. 1597–1607. PMLR (2020)
Google Scholar
Chen, T., Luo, C., Li, L.: Intriguing properties of contrastive losses. In: Advances in Neural Information Processing Systems, vol. 34, pp. 11834–11845. Curran Associates, Inc. (2021)
Google Scholar
Chen, X., Fan, H., Girshick, R., He, K.: Improved baselines with momentum contrastive learning. arXiv preprint arXiv:2003.04297v1 (2020)
Chen, X., He, K.: Exploring simple siamese representation learning. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 15745–15753 (2021). https://doi.org/10.1109/CVPR46437.2021.01549
Chuang, C.Y., Robinson, J., Lin, Y.C., Torralba, A., Jegelka, S.: Debiased contrastive learning. In: Advances in Neural Information Processing Systems, vol. 33, pp. 8765–8775. Curran Associates, Inc. (2020)
Google Scholar
Clémençon, S., Lugosi, G., Vayatis, N.: Ranking and empirical minimization of U-statistics. Ann. Stat. 36(2), 844–874 (2008). https://doi.org/10.1214/009052607000000910
Article MathSciNet MATH Google Scholar
Dubois, Y., Ermon, S., Hashimoto, T.B., Liang, P.S.: Improving self-supervised learning by characterizing idealized representations. In: Advances in Neural Information Processing Systems, vol. 35, pp. 11279–11296. Curran Associates, Inc. (2022)
Google Scholar
Dufumier, B., Barbano, C.A., Louiset, R., Duchesnay, E., Gori, P.: Rethinking positive sampling for contrastive learning with kernel. arXiv preprint arXiv:2206.01646v1 (2022)
Dwibedi, D., Aytar, Y., Tompson, J., Sermanet, P., Zisserman, A.: With a little help from my friends: Nearest-neighbor contrastive learning of visual representations. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 9568–9577 (2021). https://doi.org/10.1109/ICCV48922.2021.00945
Gao, T., Yao, X., Chen, D.: SimCSE: simple contrastive learning of sentence embeddings. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 6894–6910. Association for Computational Linguistics (2021). https://doi.org/10.18653/v1/2021.emnlp-main.552
Gretton, A., Bousquet, O., Smola, A., Schölkopf, B.: Measuring statistical dependence with hilbert-schmidt norms. In: Algorithmic Learning Theory, pp. 63–77. Springer, Berlin Heidelberg (2005). https://doi.org/10.1007/11564089_7
Grill, J.B., et al.: Bootstrap your own latent - a new approach to self-supervised learning. In: Advances in Neural Information Processing Systems, vol. 33, pp. 21271–21284. Curran Associates, Inc. (2020)
Google Scholar
HaoChen, J.Z., Ma, T.: A theoretical study of inductive biases in contrastive learning. In: The Eleventh International Conference on Learning Representations (2023). https://openreview.net/forum?id=AuEgNlEAmed
HaoChen, J.Z., Wei, C., Gaidon, A., Ma, T.: Provable guarantees for self-supervised deep learning with spectral contrastive loss. In: Advances in Neural Information Processing Systems, vol. 34, pp. 5000–5011. Curran Associates, Inc. (2021)
Google Scholar
HaoChen, J.Z., Wei, C., Kumar, A., Ma, T.: Beyond separability: Analyzing the linear transferability of contrastive representations to related subpopulations. In: Advances in Neural Information Processing Systems, vol. 35, pp. 26889–26902. Curran Associates, Inc. (2022)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90
Hua, T., Wang, W., Xue, Z., Ren, S., Wang, Y., Zhao, H.: On feature decorrelation in self-supervised learning. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 9578–9588 (2021). https://doi.org/10.1109/ICCV48922.2021.00946
Huang, W., Yi, M., Zhao, X., Jiang, Z.: Towards the generalization of contrastive self-supervised learning. In: The Eleventh International Conference on Learning Representations (2023). https://openreview.net/forum?id=XDJwuEYHhme
Jiang, D., Li, W., Cao, M., Zou, W., Li, X.: Speech SimCLR: combining contrastive and reconstruction objective for self-supervised speech representation learning. In: Proceedings of Interspeech 2021, pp. 1544–1548 (2021). https://doi.org/10.21437/Interspeech.2021-391
Jing, L., Vincent, P., LeCun, Y., Tian, Y.: Understanding dimensional collapse in contrastive self-supervised learning. In: International Conference on Learning Representations (2022). https://openreview.net/forum?id=YevsQ05DEN7
Johnson, D.D., Hanchi, A.E., Maddison, C.J.: Contrastive learning can find an optimal basis for approximately view-invariant functions. In: The Eleventh International Conference on Learning Representations (2023). https://openreview.net/forum?id=AjC0KBjiMu
Kiani, B.T., Balestriero, R., Chen, Y., Lloyd, S., LeCun, Y.: Joint embedding self-supervised learning in the kernel regime. arXiv preprint arXiv:2209.14884v1 (2022)
von Kügelgen, J., et al.: Self-supervised learning with data augmentations provably isolates content from style. In: Advances in Neural Information Processing Systems, vol. 34, pp. 16451–16467. Curran Associates, Inc. (2021)
Google Scholar
Lei, Y., Yang, T., Ying, Y., Zhou, D.X.: Generalization analysis for contrastive representation learning. arXiv preprint arXiv:2302.12383v2 (2023)
Li, Y., Pogodin, R., Sutherland, D.J., Gretton, A.: Self-supervised learning with kernel dependence maximization. In: Advances in Neural Information Processing Systems, vol. 34, pp. 15543–15556. Curran Associates, Inc. (2021)
Google Scholar
Mohri, M., Rostamizadeh, A., Talwalkar, A.: Foundations of machine learning. MIT press (2018)
Google Scholar
Muandet, K., Fukumizu, K., Sriperumbudur, B., Schölkopf, B.: Kernel mean embedding of distributions: a review and beyond. Foundation Trends® Mach. Learn. 10(1–2), 1–141 (2017)
Google Scholar
Ng, A., Jordan, M., Weiss, Y.: On spectral clustering: Analysis and an algorithm. In: Advances in Neural Information Processing Systems, vol. 14. MIT Press (2001)
Google Scholar
Nozawa, K., Germain, P., Guedj, B.: Pac-bayesian contrastive unsupervised representation learning. In: Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI). Proceedings of Machine Learning Research, vol. 124, pp. 21–30. PMLR (2020)
Google Scholar
Nozawa, K., Sato, I.: Understanding negative samples in instance discriminative self-supervised representation learning. In: Advances in Neural Information Processing Systems, vol. 34, pp. 5784–5797. Curran Associates, Inc. (2021)
Google Scholar
van den Oord, A., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748v2 (2018)
Parulekar, A., Collins, L., Shanmugam, K., Mokhtari, A., Shakkottai, S.: Infonce loss provably learns cluster-preserving representations. arXiv preprint arXiv:2302.07920v1 (2023)
Poole, B., Ozair, S., Van Den Oord, A., Alemi, A., Tucker, G.: On variational bounds of mutual information. In: Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97, pp. 5171–5180. PMLR (2019)
Google Scholar
Robinson, J., Sun, L., Yu, K., Batmanghelich, K., Jegelka, S., Sra, S.: Can contrastive learning avoid shortcut solutions? In: Advances in Neural Information Processing Systems, vol. 34, pp. 4974–4986. Curran Associates, Inc. (2021)
Google Scholar
Robinson, J.D., Chuang, C.Y., Sra, S., Jegelka, S.: Contrastive learning with hard negative samples. In: International Conference on Learning Representations (2021). https://openreview.net/forum?id=CR1XOQ0UTh-
Saunshi, N., et al.: Understanding contrastive learning requires incorporating inductive biases. In: Proceedings of the 39th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 162, pp. 19250–19286. PMLR (2022)
Google Scholar
Shen, K., et al.: Connect, not collapse: Explaining contrastive learning for unsupervised domain adaptation. In: Proceedings of the 39th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 162, pp. 19847–19878. PMLR (2022)
Google Scholar
Singh, A.: Clda: contrastive learning for semi-supervised domain adaptation. In: Advances in Neural Information Processing Systems, vol. 34, pp. 5089–5101. Curran Associates, Inc. (2021)
Google Scholar
Steinwart, I., Christmann, A.: Support vector machines. Springer Science & Business Media (2008). https://doi.org/10.1007/978-0-387-77242-4
Terada, Y., Yamamoto, M.: Kernel normalized cut: a theoretical revisit. In: Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97, pp. 6206–6214. PMLR (2019)
Google Scholar
Tian, Y.: Understanding deep contrastive learning via coordinate-wise optimization. In: Advances in Neural Information Processing Systems. vol. 35, pp. 19511–19522. Curran Associates, Inc. (2022)
Google Scholar
Tosh, C., Krishnamurthy, A., Hsu, D.: Contrastive estimation reveals topic posterior information to linear models. J. Mach. Learn. Res. 22(281), 1–31 (2021)
MathSciNet MATH Google Scholar
Tosh, C., Krishnamurthy, A., Hsu, D.: Contrastive learning, multi-view redundancy, and linear models. In: Proceedings of the 32nd International Conference on Algorithmic Learning Theory. Proceedings of Machine Learning Research, vol. 132, pp. 1179–1206. PMLR (2021)
Google Scholar
Tsai, Y.H.H., et al.: Conditional contrastive learning with kernel. In: International Conference on Learning Representations (2022). https://openreview.net/forum?id=AAJLBoGt0XM
Tsai, Y.H.H., Ma, M.Q., Yang, M., Zhao, H., Morency, L.P., Salakhutdinov, R.: Self-supervised representation learning with relative predictive coding. In: International Conference on Learning Representations (2021). https://openreview.net/forum?id=068E_JSq9O
Tsai, Y.H.H., Zhao, H., Yamada, M., Morency, L.P., Salakhutdinov, R.R.: Neural methods for point-wise dependency estimation. In: Advances in Neural Information Processing Systems, vol. 33, pp. 62–72. Curran Associates, Inc. (2020)
Google Scholar
Tschannen, M., Djolonga, J., Rubenstein, P.K., Gelly, S., Lucic, M.: On mutual information maximization for representation learning. In: International Conference on Learning Representations (2020). https://openreview.net/forum?id=rkxoh24FPH
Tu, Z., Zhang, J., Tao, D.: Theoretical analysis of adversarial learning: A minimax approach. In: Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc. (2019)
Google Scholar
Van Gansbeke, W., Vandenhende, S., Georgoulis, S., Proesmans, M., Van Gool, L.: SCAN: learning to classify images without labels. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12355, pp. 268–285. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58607-2_16
Chapter Google Scholar
Wang, F., Liu, H.: Understanding the behaviour of contrastive loss. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2495–2504 (2021). https://doi.org/10.1109/CVPR46437.2021.00252
Wang, T., Isola, P.: Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In: Proceedings of the 37th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 119, pp. 9929–9939. PMLR (2020)
Google Scholar
Wang, Y., Zhang, Q., Wang, Y., Yang, J., Lin, Z.: Chaos is a ladder: a new theoretical understanding of contrastive learning via augmentation overlap. In: International Conference on Learning Representations (2022). https://openreview.net/forum?id=ECvgmYVyeUz
Wang, Z., Luo, Y., Li, Y., Zhu, J., Schölkopf, B.: Spectral representation learning for conditional moment models. arXiv preprint arXiv:2210.16525v2 (2022)
Wen, Z., Li, Y.: Toward understanding the feature learning process of self-supervised contrastive learning. In: Proceedings of the 38th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 139, pp. 11112–11122. PMLR (2021)
Google Scholar
Yeh, C.H., Hong, C.Y., Hsu, Y.C., Liu, T.L., Chen, Y., LeCun, Y.: Decoupled contrastive learning. In: Computer Vision - ECCV 2022. pp. 668–684. Springer Nature Switzerland (2022). https://doi.org/10.1007/978-3-031-19809-0_38
Zhang, G., Lu, Y., Sun, S., Guo, H., Yu, Y.: \$f\$-mutual information contrastive learning (2022). https://openreview.net/forum?id=3kTt_W1_tgw
Zhang, R.R., Liu, X., Wang, Y., Wang, L.: Mcdiarmid-type inequalities for graph-dependent variables and stability bounds. In: Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc. (2019)
Google Scholar
Zhang, T.: Statistical analysis of some multi-category large margin classification methods. J. Mach. Learn. Res. 5, 1225–1251 (2004)
MathSciNet MATH Google Scholar
Zhao, X., Du, T., Wang, Y., Yao, J., Huang, W.: ArCL: enhancing contrastive learning with augmentation-robust representations. In: The Eleventh International Conference on Learning Representations (2023). https://openreview.net/forum?id=n0Pb9T5kmb
Zou, X., Liu, W.: Generalization bounds for adversarial contrastive learning. J. Mach. Learn. Res. 24(114), 1–54 (2023)
MathSciNet Google Scholar

Download references

Acknowledgements

The authors would like to thank Tomohiro Hayase and Takayuki Kawashima for useful comments. TK was partially supported by JSPS KAKENHI Grant Number 19H04071, 20H00576, and 23H03460.

Author information

Authors and Affiliations

Tokyo Institute of Technology, Tokyo, Japan
Hiroki Waida, Takumi Nakagawa, Yuhui Zhang & Takafumi Kanamori
Fujitsu, Kanagawa, Japan
Yuichiro Wada
RIKEN AIP, Tokyo, Japan
Yuichiro Wada, Takumi Nakagawa & Takafumi Kanamori
Institut de Mathématiques de Toulouse, Toulouse, France
Léo Andéol
SNCF, Saint-Denis, France
Léo Andéol
Université de Toulouse, Toulouse, France
Léo Andéol
CNRS, Toulouse, France
Léo Andéol

Authors

Hiroki Waida
View author publications
You can also search for this author in PubMed Google Scholar
Yuichiro Wada
View author publications
You can also search for this author in PubMed Google Scholar
Léo Andéol
View author publications
You can also search for this author in PubMed Google Scholar
Takumi Nakagawa
View author publications
You can also search for this author in PubMed Google Scholar
Yuhui Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Takafumi Kanamori
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hiroki Waida .

Editor information

Editors and Affiliations

University of Michigan, Ann Arbor, MI, USA
Danai Koutra
University of Vienna, Vienna, Austria
Claudia Plant
Max Planck Institute for Software Systems, Kaiserslautern, Germany
Manuel Gomez Rodriguez
Politecnico di Torino, Turin, Italy
Elena Baralis
CENTAI, Turin, Italy
Francesco Bonchi

Ethics declarations

Since this paper mainly studies theoretical analysis of contrastive learning, it will not be thought that there is a direct negative social impact. However, revealing detailed properties of contrastive learning could promote an opportunity to misuse the knowledge. We point out that such wrong usage is not straightforward with the proposed method, as the application is not discussed much in the paper.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Waida, H., Wada, Y., Andéol, L., Nakagawa, T., Zhang, Y., Kanamori, T. (2023). Towards Understanding the Mechanism of Contrastive Learning via Similarity Structure: A Theoretical Analysis. In: Koutra, D., Plant, C., Gomez Rodriguez, M., Baralis, E., Bonchi, F. (eds) Machine Learning and Knowledge Discovery in Databases: Research Track. ECML PKDD 2023. Lecture Notes in Computer Science(), vol 14172. Springer, Cham. https://doi.org/10.1007/978-3-031-43421-1_42

Download citation

DOI: https://doi.org/10.1007/978-3-031-43421-1_42
Published: 18 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43420-4
Online ISBN: 978-3-031-43421-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

Towards Understanding the Mechanism of Contrastive Learning via Similarity Structure: A Theoretical Analysis