Skip to main content

Towards Understanding the Mechanism of Contrastive Learning via Similarity Structure: A Theoretical Analysis

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases: Research Track (ECML PKDD 2023)

Abstract

Contrastive learning is an efficient approach to self-supervised representation learning. Although recent studies have made progress in the theoretical understanding of contrastive learning, the investigation of how to characterize the clusters of the learned representations is still limited. In this paper, we aim to elucidate the characterization from theoretical perspectives. To this end, we consider a kernel-based contrastive learning framework termed Kernel Contrastive Learning (KCL), where kernel functions play an important role when applying our theoretical results to other frameworks. We introduce a formulation of the similarity structure of learned representations by utilizing a statistical dependency viewpoint. We investigate the theoretical properties of the kernel-based contrastive loss via this formulation. We first prove that the formulation characterizes the structure of representations learned with the kernel-based contrastive learning framework. We show a new upper bound of the classification error of a downstream task, which explains that our theory is consistent with the empirical success of contrastive learning. We also establish a generalization error bound of KCL. Finally, we show a guarantee for the generalization ability of KCL to the downstream classification task via a surrogate bound.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Supplementary material is available at https://github.com/hrkyd/KernelCL/tree/main/supplementary_material.

  2. 2.

    Code is available at https://github.com/hrkyd/KernelCL/tree/main/code.

References

  1. Aronszajn, N.: Theory of reproducing kernels. Trans. Am. Math. Soc. 68(3), 337–404 (1950). https://doi.org/10.1090/s0002-9947-1950-0051437-7

    Article  MathSciNet  MATH  Google Scholar 

  2. Arora, S., Khandeparkar, H., Khodak, M., Plevrakis, O., Saunshi, N.: A theoretical analysis of contrastive unsupervised representation learning. In: Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97, pp. 5628–5637. PMLR (2019)

    Google Scholar 

  3. Ash, J., Goel, S., Krishnamurthy, A., Misra, D.: Investigating the role of negatives in contrastive representation learning. In: Proceedings of The 25th International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol. 151, pp. 7187–7209. PMLR (2022)

    Google Scholar 

  4. Awasthi, P., Dikkala, N., Kamath, P.: Do more negative samples necessarily hurt in contrastive learning? In: Proceedings of the 39th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 162, pp. 1101–1116. PMLR (2022)

    Google Scholar 

  5. Bao, H., Nagano, Y., Nozawa, K.: On the surrogate gap between contrastive and supervised losses. In: Proceedings of the 39th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 162, pp. 1585–1606. PMLR (2022)

    Google Scholar 

  6. Berlinet, A., Thomas-Agnan, C.: Reproducing kernel Hilbert spaces in probability and statistics. Springer Science & Business Media (2004). https://doi.org/10.1007/978-1-4419-9096-9

  7. Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., Joulin, A.: Unsupervised learning of visual features by contrasting cluster assignments. In: Advances in Neural Information Processing Systems. vol. 33, pp. 9912–9924. Curran Associates, Inc. (2020)

    Google Scholar 

  8. Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: Proceedings of the 37th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 119, pp. 1597–1607. PMLR (2020)

    Google Scholar 

  9. Chen, T., Luo, C., Li, L.: Intriguing properties of contrastive losses. In: Advances in Neural Information Processing Systems, vol. 34, pp. 11834–11845. Curran Associates, Inc. (2021)

    Google Scholar 

  10. Chen, X., Fan, H., Girshick, R., He, K.: Improved baselines with momentum contrastive learning. arXiv preprint arXiv:2003.04297v1 (2020)

  11. Chen, X., He, K.: Exploring simple siamese representation learning. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 15745–15753 (2021). https://doi.org/10.1109/CVPR46437.2021.01549

  12. Chuang, C.Y., Robinson, J., Lin, Y.C., Torralba, A., Jegelka, S.: Debiased contrastive learning. In: Advances in Neural Information Processing Systems, vol. 33, pp. 8765–8775. Curran Associates, Inc. (2020)

    Google Scholar 

  13. Clémençon, S., Lugosi, G., Vayatis, N.: Ranking and empirical minimization of U-statistics. Ann. Stat. 36(2), 844–874 (2008). https://doi.org/10.1214/009052607000000910

    Article  MathSciNet  MATH  Google Scholar 

  14. Dubois, Y., Ermon, S., Hashimoto, T.B., Liang, P.S.: Improving self-supervised learning by characterizing idealized representations. In: Advances in Neural Information Processing Systems, vol. 35, pp. 11279–11296. Curran Associates, Inc. (2022)

    Google Scholar 

  15. Dufumier, B., Barbano, C.A., Louiset, R., Duchesnay, E., Gori, P.: Rethinking positive sampling for contrastive learning with kernel. arXiv preprint arXiv:2206.01646v1 (2022)

  16. Dwibedi, D., Aytar, Y., Tompson, J., Sermanet, P., Zisserman, A.: With a little help from my friends: Nearest-neighbor contrastive learning of visual representations. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 9568–9577 (2021). https://doi.org/10.1109/ICCV48922.2021.00945

  17. Gao, T., Yao, X., Chen, D.: SimCSE: simple contrastive learning of sentence embeddings. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 6894–6910. Association for Computational Linguistics (2021). https://doi.org/10.18653/v1/2021.emnlp-main.552

  18. Gretton, A., Bousquet, O., Smola, A., Schölkopf, B.: Measuring statistical dependence with hilbert-schmidt norms. In: Algorithmic Learning Theory, pp. 63–77. Springer, Berlin Heidelberg (2005). https://doi.org/10.1007/11564089_7

  19. Grill, J.B., et al.: Bootstrap your own latent - a new approach to self-supervised learning. In: Advances in Neural Information Processing Systems, vol. 33, pp. 21271–21284. Curran Associates, Inc. (2020)

    Google Scholar 

  20. HaoChen, J.Z., Ma, T.: A theoretical study of inductive biases in contrastive learning. In: The Eleventh International Conference on Learning Representations (2023). https://openreview.net/forum?id=AuEgNlEAmed

  21. HaoChen, J.Z., Wei, C., Gaidon, A., Ma, T.: Provable guarantees for self-supervised deep learning with spectral contrastive loss. In: Advances in Neural Information Processing Systems, vol. 34, pp. 5000–5011. Curran Associates, Inc. (2021)

    Google Scholar 

  22. HaoChen, J.Z., Wei, C., Kumar, A., Ma, T.: Beyond separability: Analyzing the linear transferability of contrastive representations to related subpopulations. In: Advances in Neural Information Processing Systems, vol. 35, pp. 26889–26902. Curran Associates, Inc. (2022)

    Google Scholar 

  23. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90

  24. Hua, T., Wang, W., Xue, Z., Ren, S., Wang, Y., Zhao, H.: On feature decorrelation in self-supervised learning. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 9578–9588 (2021). https://doi.org/10.1109/ICCV48922.2021.00946

  25. Huang, W., Yi, M., Zhao, X., Jiang, Z.: Towards the generalization of contrastive self-supervised learning. In: The Eleventh International Conference on Learning Representations (2023). https://openreview.net/forum?id=XDJwuEYHhme

  26. Jiang, D., Li, W., Cao, M., Zou, W., Li, X.: Speech SimCLR: combining contrastive and reconstruction objective for self-supervised speech representation learning. In: Proceedings of Interspeech 2021, pp. 1544–1548 (2021). https://doi.org/10.21437/Interspeech.2021-391

  27. Jing, L., Vincent, P., LeCun, Y., Tian, Y.: Understanding dimensional collapse in contrastive self-supervised learning. In: International Conference on Learning Representations (2022). https://openreview.net/forum?id=YevsQ05DEN7

  28. Johnson, D.D., Hanchi, A.E., Maddison, C.J.: Contrastive learning can find an optimal basis for approximately view-invariant functions. In: The Eleventh International Conference on Learning Representations (2023). https://openreview.net/forum?id=AjC0KBjiMu

  29. Kiani, B.T., Balestriero, R., Chen, Y., Lloyd, S., LeCun, Y.: Joint embedding self-supervised learning in the kernel regime. arXiv preprint arXiv:2209.14884v1 (2022)

  30. von Kügelgen, J., et al.: Self-supervised learning with data augmentations provably isolates content from style. In: Advances in Neural Information Processing Systems, vol. 34, pp. 16451–16467. Curran Associates, Inc. (2021)

    Google Scholar 

  31. Lei, Y., Yang, T., Ying, Y., Zhou, D.X.: Generalization analysis for contrastive representation learning. arXiv preprint arXiv:2302.12383v2 (2023)

  32. Li, Y., Pogodin, R., Sutherland, D.J., Gretton, A.: Self-supervised learning with kernel dependence maximization. In: Advances in Neural Information Processing Systems, vol. 34, pp. 15543–15556. Curran Associates, Inc. (2021)

    Google Scholar 

  33. Mohri, M., Rostamizadeh, A., Talwalkar, A.: Foundations of machine learning. MIT press (2018)

    Google Scholar 

  34. Muandet, K., Fukumizu, K., Sriperumbudur, B., Schölkopf, B.: Kernel mean embedding of distributions: a review and beyond. Foundation Trends® Mach. Learn. 10(1–2), 1–141 (2017)

    Google Scholar 

  35. Ng, A., Jordan, M., Weiss, Y.: On spectral clustering: Analysis and an algorithm. In: Advances in Neural Information Processing Systems, vol. 14. MIT Press (2001)

    Google Scholar 

  36. Nozawa, K., Germain, P., Guedj, B.: Pac-bayesian contrastive unsupervised representation learning. In: Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI). Proceedings of Machine Learning Research, vol. 124, pp. 21–30. PMLR (2020)

    Google Scholar 

  37. Nozawa, K., Sato, I.: Understanding negative samples in instance discriminative self-supervised representation learning. In: Advances in Neural Information Processing Systems, vol. 34, pp. 5784–5797. Curran Associates, Inc. (2021)

    Google Scholar 

  38. van den Oord, A., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748v2 (2018)

  39. Parulekar, A., Collins, L., Shanmugam, K., Mokhtari, A., Shakkottai, S.: Infonce loss provably learns cluster-preserving representations. arXiv preprint arXiv:2302.07920v1 (2023)

  40. Poole, B., Ozair, S., Van Den Oord, A., Alemi, A., Tucker, G.: On variational bounds of mutual information. In: Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97, pp. 5171–5180. PMLR (2019)

    Google Scholar 

  41. Robinson, J., Sun, L., Yu, K., Batmanghelich, K., Jegelka, S., Sra, S.: Can contrastive learning avoid shortcut solutions? In: Advances in Neural Information Processing Systems, vol. 34, pp. 4974–4986. Curran Associates, Inc. (2021)

    Google Scholar 

  42. Robinson, J.D., Chuang, C.Y., Sra, S., Jegelka, S.: Contrastive learning with hard negative samples. In: International Conference on Learning Representations (2021). https://openreview.net/forum?id=CR1XOQ0UTh-

  43. Saunshi, N., et al.: Understanding contrastive learning requires incorporating inductive biases. In: Proceedings of the 39th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 162, pp. 19250–19286. PMLR (2022)

    Google Scholar 

  44. Shen, K., et al.: Connect, not collapse: Explaining contrastive learning for unsupervised domain adaptation. In: Proceedings of the 39th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 162, pp. 19847–19878. PMLR (2022)

    Google Scholar 

  45. Singh, A.: Clda: contrastive learning for semi-supervised domain adaptation. In: Advances in Neural Information Processing Systems, vol. 34, pp. 5089–5101. Curran Associates, Inc. (2021)

    Google Scholar 

  46. Steinwart, I., Christmann, A.: Support vector machines. Springer Science & Business Media (2008). https://doi.org/10.1007/978-0-387-77242-4

  47. Terada, Y., Yamamoto, M.: Kernel normalized cut: a theoretical revisit. In: Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97, pp. 6206–6214. PMLR (2019)

    Google Scholar 

  48. Tian, Y.: Understanding deep contrastive learning via coordinate-wise optimization. In: Advances in Neural Information Processing Systems. vol. 35, pp. 19511–19522. Curran Associates, Inc. (2022)

    Google Scholar 

  49. Tosh, C., Krishnamurthy, A., Hsu, D.: Contrastive estimation reveals topic posterior information to linear models. J. Mach. Learn. Res. 22(281), 1–31 (2021)

    MathSciNet  MATH  Google Scholar 

  50. Tosh, C., Krishnamurthy, A., Hsu, D.: Contrastive learning, multi-view redundancy, and linear models. In: Proceedings of the 32nd International Conference on Algorithmic Learning Theory. Proceedings of Machine Learning Research, vol. 132, pp. 1179–1206. PMLR (2021)

    Google Scholar 

  51. Tsai, Y.H.H., et al.: Conditional contrastive learning with kernel. In: International Conference on Learning Representations (2022). https://openreview.net/forum?id=AAJLBoGt0XM

  52. Tsai, Y.H.H., Ma, M.Q., Yang, M., Zhao, H., Morency, L.P., Salakhutdinov, R.: Self-supervised representation learning with relative predictive coding. In: International Conference on Learning Representations (2021). https://openreview.net/forum?id=068E_JSq9O

  53. Tsai, Y.H.H., Zhao, H., Yamada, M., Morency, L.P., Salakhutdinov, R.R.: Neural methods for point-wise dependency estimation. In: Advances in Neural Information Processing Systems, vol. 33, pp. 62–72. Curran Associates, Inc. (2020)

    Google Scholar 

  54. Tschannen, M., Djolonga, J., Rubenstein, P.K., Gelly, S., Lucic, M.: On mutual information maximization for representation learning. In: International Conference on Learning Representations (2020). https://openreview.net/forum?id=rkxoh24FPH

  55. Tu, Z., Zhang, J., Tao, D.: Theoretical analysis of adversarial learning: A minimax approach. In: Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc. (2019)

    Google Scholar 

  56. Van Gansbeke, W., Vandenhende, S., Georgoulis, S., Proesmans, M., Van Gool, L.: SCAN: learning to classify images without labels. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12355, pp. 268–285. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58607-2_16

    Chapter  Google Scholar 

  57. Wang, F., Liu, H.: Understanding the behaviour of contrastive loss. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2495–2504 (2021). https://doi.org/10.1109/CVPR46437.2021.00252

  58. Wang, T., Isola, P.: Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In: Proceedings of the 37th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 119, pp. 9929–9939. PMLR (2020)

    Google Scholar 

  59. Wang, Y., Zhang, Q., Wang, Y., Yang, J., Lin, Z.: Chaos is a ladder: a new theoretical understanding of contrastive learning via augmentation overlap. In: International Conference on Learning Representations (2022). https://openreview.net/forum?id=ECvgmYVyeUz

  60. Wang, Z., Luo, Y., Li, Y., Zhu, J., Schölkopf, B.: Spectral representation learning for conditional moment models. arXiv preprint arXiv:2210.16525v2 (2022)

  61. Wen, Z., Li, Y.: Toward understanding the feature learning process of self-supervised contrastive learning. In: Proceedings of the 38th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 139, pp. 11112–11122. PMLR (2021)

    Google Scholar 

  62. Yeh, C.H., Hong, C.Y., Hsu, Y.C., Liu, T.L., Chen, Y., LeCun, Y.: Decoupled contrastive learning. In: Computer Vision - ECCV 2022. pp. 668–684. Springer Nature Switzerland (2022). https://doi.org/10.1007/978-3-031-19809-0_38

  63. Zhang, G., Lu, Y., Sun, S., Guo, H., Yu, Y.: \$f\$-mutual information contrastive learning (2022). https://openreview.net/forum?id=3kTt_W1_tgw

  64. Zhang, R.R., Liu, X., Wang, Y., Wang, L.: Mcdiarmid-type inequalities for graph-dependent variables and stability bounds. In: Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc. (2019)

    Google Scholar 

  65. Zhang, T.: Statistical analysis of some multi-category large margin classification methods. J. Mach. Learn. Res. 5, 1225–1251 (2004)

    MathSciNet  MATH  Google Scholar 

  66. Zhao, X., Du, T., Wang, Y., Yao, J., Huang, W.: ArCL: enhancing contrastive learning with augmentation-robust representations. In: The Eleventh International Conference on Learning Representations (2023). https://openreview.net/forum?id=n0Pb9T5kmb

  67. Zou, X., Liu, W.: Generalization bounds for adversarial contrastive learning. J. Mach. Learn. Res. 24(114), 1–54 (2023)

    MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors would like to thank Tomohiro Hayase and Takayuki Kawashima for useful comments. TK was partially supported by JSPS KAKENHI Grant Number 19H04071, 20H00576, and 23H03460.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hiroki Waida .

Editor information

Editors and Affiliations

Ethics declarations

Since this paper mainly studies theoretical analysis of contrastive learning, it will not be thought that there is a direct negative social impact. However, revealing detailed properties of contrastive learning could promote an opportunity to misuse the knowledge. We point out that such wrong usage is not straightforward with the proposed method, as the application is not discussed much in the paper.

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Waida, H., Wada, Y., Andéol, L., Nakagawa, T., Zhang, Y., Kanamori, T. (2023). Towards Understanding the Mechanism of Contrastive Learning via Similarity Structure: A Theoretical Analysis. In: Koutra, D., Plant, C., Gomez Rodriguez, M., Baralis, E., Bonchi, F. (eds) Machine Learning and Knowledge Discovery in Databases: Research Track. ECML PKDD 2023. Lecture Notes in Computer Science(), vol 14172. Springer, Cham. https://doi.org/10.1007/978-3-031-43421-1_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-43421-1_42

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-43420-4

  • Online ISBN: 978-3-031-43421-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics