Abstract
During the last few years, deep supervised learning models have been shown to achieve state-of-the-art results for Natural Language Processing tasks. Most of these models are trained by minimizing the commonly used cross-entropy loss. However, the latter may suffer from several shortcomings such as sub-optimal generalization and unstable fine-tuning. Inspired by the recent works on self-supervised contrastive representation learning, we present SimSCL, a framework for binary text classification task that relies on two simple concepts: (i) Sampling positive and negative examples given an anchor by considering that sentences belonging to the same class as the anchor as positive examples and samples belonging to a different class as negative examples and (ii) Using a novel fully-supervised contrastive loss that enforces more compact clustering by leveraging label information more effectively. The experimental results show that our framework outperforms the standard cross-entropy loss in several benchmark datasets. Further experiments on Moroccan and Algerian dialects demonstrate that our framework also works well for under-resource languages .
Y. Moukafih and A. Ghanem—Equal contribution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
This corpus will be made public.
References
Pang, T., Xu, K., Dong, Y., Du, C., Chen, N., Zhu, J.: GRethinking softmax cross-entropy loss for adversarial robustness. arXiv preprint arXiv:1905.10626 (2019)
Zhang, T., Wu, F., Katiyar, A., Weinberger, K.Q., Artzi, Y.: Revisiting few-sample BERT fine-tuning. arXiv preprint arXiv:2006.05987 (2020)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
SMüller, R., Kornblith, S., Hinton, G.: When does label smoothing help? arXiv preprint arXiv:1906.02629 (2019)
SZhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: mixup: beyond empirical risk minimization. arXiv preprint arXiv:1710.09412 (2017)
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)
Andreieva, V., Shvai, N.: Generalization of cross-entropy loss function for image classification. arXiv preprint arXiv:1503.02537 (2020)
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning (2020)
Khosla, P., et al.: Supervised contrastive learning. arXiv preprint arXiv:2004.11362 (2020)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Hassani, K., Khasahmadi, A.H.: Contrastive multi-view representation learning on graphs. In: International Conference on Machine Learning (2020)
Zhu, Y., Xu, Y., Yu, F., Liu, Q., Wu, S., Wang, L.: Deep graph contrastive representation learning. arXiv preprint arXiv:2006.04131 (2020)
Veličković, P., Fedus, W., Hamilton, W.L., Liò, P., Bengio, Y., Hjelm, R.D.: Deep graph infomax. arXiv preprint arXiv:1809.10341 (2018)
Qiu, J., et al.: GCC: graph contrastive coding for graph neural network pre-training. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (2020)
Hafidi, H., Ghogho, M., Ciblat, P., Swami, A.: GraphCL: contrastive self-supervised learning of graph representations. arXiv preprint arXiv:2007.08025 (2020)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Giorgi, J.M., Nitski, O., Bader, G.D., Wang, B.: DeCLUTR: deep contrastive learning for unsupervised textual representations. arXiv preprint arXiv:2006.03659 (2020)
Gunel, B., Du, J., Conneau, A., Stoyanov, V.: Supervised contrastive learning for pre-trained language model fine-tuning. arXiv preprint arXiv:2011.01403 (2020)
Abidi, K., Menacer, M.A., Smaili, K.: CALYOU: a comparable spoken Algerian corpus harvested from YouTube. In: 18th Annual Conference of the International Communication Association (Interspeech) (2017)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Moukafih, Y., Ghanem, A., Abidi, K., Sbihi, N., Ghogho, M., Smaili, K. (2022). SimSCL: A Simple Fully-Supervised Contrastive Learning Framework for Text Representation. In: Long, G., Yu, X., Wang, S. (eds) AI 2021: Advances in Artificial Intelligence. AI 2022. Lecture Notes in Computer Science(), vol 13151. Springer, Cham. https://doi.org/10.1007/978-3-030-97546-3_59
Download citation
DOI: https://doi.org/10.1007/978-3-030-97546-3_59
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-97545-6
Online ISBN: 978-3-030-97546-3
eBook Packages: Computer ScienceComputer Science (R0)