On the Robustness of Semi-supervised Learning for Cyberbullying Detection in Social Media

Dumitrescu, Andrei; Ionescu, Diana; Rebedea, Traian

doi:10.1007/978-3-031-53957-2_6

Part of the book series: Learning and Analytics in Intelligent Systems ((LAIS,volume 36))

Included in the following conference series:

Romanian Conference on Human-Computer Interaction

6 Accesses

Abstract

Cyberbullying has become an usual form of harassment nowadays because most of the time we use digital technologies to communicate with others. This type of bullying can affect our mental, emotional, and also physical health. Also, the significant impact of cyberbullying is that it can spread easily and quickly around the world. In most of the cases, a cyberbullying attack is discovered too late, after all the negative effects have already affected the assaulted person. Researchers have proposed several solutions to detect cyberbullying attacks on social media mainly using machine learning techniques. While Artificial Intelligence provides a solution to discover cyberbullying, Human-Computer Interaction principles can be integrated with machine learning to offer the user a reliable feature to protect them from cyberbulling. The majority of the proposed solutions are supervised approaches that leverage the representation power of deep neural networks. In our recent days, the amount of unlabeled data heavily outnumbers the amount of annotated data, as it is easier and more convenient to obtain it. Labeling samples usually implies manual annotation by human experts, a process which is expensive and prone to error. The aim of this chapter is to leverage the large amount of unlabeled data that can be easily collected and use it alongside with semi-supervised learning approaches to enhance cyberbullying detection.

A. Dumitrescu, D. Ionescu, and T. Rebedea: These authors contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

R. Garett, L.R. Lord, S.D. Young, Associations between social media and cyberbullying: a review of the literature. mhealth 2 (2016)
Google Scholar
I. Kwan, K. Dickson, M. Richardson, W. MacDowall, H. Burchett, C. Stansfield, G. Brunton, K. Sutcliffe, J. Thomas, Cyberbullying and children and young people’s mental health: a systematic map of systematic reviews. Cyberpsychology, Behav. Soc. Netw. 23(2), 72–82 (2020)
Article Google Scholar
M.M. Islam, M.A. Uddin, L. Islam, A. Akter, S. Sharmin, U.K. Acharjee, Cyberbullying detection on social networks using machine learning approaches, in 2020 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE) (IEEE, 2020), pp. 1–6
Google Scholar
C. Van Hee, G. Jacobs, C. Emmery, B. Desmet, E. Lefever, B. Verhoeven, G. De Pauw, W. Daelemans, V. Hoste, Automatic detection of cyberbullying in social media text. PloS one 13(10), 0203794 (2018)
Google Scholar
S. Agrawal, A. Awekar, Deep learning for detecting cyberbullying across multiple social media platforms, in European Conference on Information Retrieval (Springer, 2018), pp. 141–153
Google Scholar
T.H. Teng, K.D. Varathan, Cyberbullying detection in social networks: a comparison between machine learning and transfer learning approaches. IEEE Access (2023)
Google Scholar
D. Ionescu, A. Dumitrescu, T. Rebedea, Enhancing cyberbullying detection in social media using semi-supervised learning, in 19th International Conference on Human-Computer Interaction, RoCHI 2022, Craiova, Romania/Hybrid, ed. by P. Popescu, C. Kolski (6-7 Oct 2022), pp. 84–92. (Matrix Rom)
Google Scholar
D.-H. Lee, Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks, in Workshop on Challenges in Representation Learning, ICML, vol. 3 (2013), p. 896
Google Scholar
T. Mikolov, K. Chen, G. Corrado, J. Dean, Efficient estimation of word representations in vector space (2013). arXiv:1301.3781
J. Pennington, R. Socher, C.D. Manning, GloVe: global vectors for word representation, in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (2014), pp. 1532–1543
Google Scholar
M.-R. Amini, V. Feofanov, L. Pauletto, E. Devijver, Y. Maximov, Self-training: a survey (2022). arXiv:2202.12040
Q. Xie, Z. Dai, E. Hovy, T. Luong, Q. Le, Unsupervised data augmentation for consistency training, in Advances in Neural Information Processing Systems, vol. 33 (2020), pp. 6256–6268
Google Scholar
C. Shorten, T.M. Khoshgoftaar, A survey on image data augmentation for deep learning. J. Big Data 6(1), 1–48 (2019)
Article Google Scholar
J. Wei, K. Zou, Eda: easy data augmentation techniques for boosting performance on text classification tasks (2019). arXiv:1901.11196
S. Edunov, M. Ott, M. Auli, D. Grangier, Understanding back-translation at scale (2018). arXiv:1808.09381
J. Chen, Y. Wu, D. Yang, Semi-supervised models via data augmentation for classifying interactive affective responses (2020). arXiv:2004.10972
H. Zhang, M. Cisse, Y.N. Dauphin, D. Lopez-Paz, mixup: beyond empirical risk minimization (2017). arXiv:1710.09412
M. Sajjadi, M. Javanmardi, T. Tasdizen, Regularization with stochastic transformations and perturbations for deep semi-supervised learning, in Advances in Neural Information Processing Systems, vol. 29 (2016)
Google Scholar
S. Laine, T. Aila, Temporal ensembling for semi-supervised learning (2016). arXiv:1610.02242
K. Sohn, D. Berthelot, N. Carlini, Z. Zhang, H. Zhang, C.A. Raffel, E.D. Cubuk, A. Kurakin, C.-L. Li, Fixmatch: Simplifying semi-supervised learning with consistency and confidence, in Advances in Neural Information Processing Systems, vol. 33 (2020), pp. 596–608
Google Scholar
D. Berthelot, N. Carlini, I. Goodfellow, N. Papernot, A. Oliver, C.A. Raffel, Mixmatch: a holistic approach to semi-supervised learning, in Advances in Neural Information Processing Systems, vol. 32 (2019)
Google Scholar
J. Chen, Z. Yang, D. Yang, Mixtext: linguistically-informed interpolation of hidden space for semi-supervised text classification (2020). arXiv:2004.12239
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, Ł. Kaiser, I. Polosukhin, Attention is all you need, in Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding (2018). arXiv:1810.04805
G. Jawahar, B. Sagot, D. Seddah, What does bert learn about the structure of language? in ACL 2019-57th Annual Meeting of the Association for Computational Linguistics (2019)
Google Scholar
Y. Grandvalet, Y. Bengio, Semi-supervised learning by entropy minimization, in Advances in Neural Information Processing Systems, vol. 17 (2004)
Google Scholar
Q. Xie, M.-T. Luong, E. Hovy, Q.V. Le, Self-training with noisy student improves imagenet classification, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020), pp. 10687–10698
Google Scholar
I. Sirbu, T. Sosea, C. Caragea, D. Caragea, T. Rebedea, Multimodal semi-supervised learning for disaster tweet classification, in Proceedings of the 29th International Conference on Computational Linguistics (2022), pp. 2711–2723
Google Scholar
J. Wang, K. Fu, C.-T. Lu, Sosnet: a graph convolutional network approach to fine-grained cyberbullying detection, in 2020 IEEE International Conference on Big Data (Big Data) (IEEE, 2020), pp. 1699–1708
Google Scholar
F. Elsafoury, S. Katsigiannis, Z. Pervez, N. Ramzan, When the timeline meets the pipeline: a survey on automated cyberbullying detection. IEEE Access 9, 103541–103563 (2021)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University Politehnica of Bucharest, 313 Splaiul Independentei, Bucharest, Romania
Andrei Dumitrescu, Diana Ionescu & Traian Rebedea

Authors

Andrei Dumitrescu
View author publications
You can also search for this author in PubMed Google Scholar
Diana Ionescu
View author publications
You can also search for this author in PubMed Google Scholar
Traian Rebedea
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Traian Rebedea .

Editor information

Editors and Affiliations

LAMIH - CNRS, Université Polytechnique Hauts-de-France, Valenciennes, France
Christophe Kolski
Faculty of Automatics, Computers and Electronics, University of Craiova, Craiova, Dolj, Romania
Marian Cristian Mihăescu
Faculty of Automatic Control and Computers, University Politehnica of Bucharest, Bucharest, Romania
Traian Rebedea

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dumitrescu, A., Ionescu, D., Rebedea, T. (2024). On the Robustness of Semi-supervised Learning for Cyberbullying Detection in Social Media. In: Kolski, C., Mihăescu, M.C., Rebedea, T. (eds) AI Approaches for Designing and Evaluating Interactive Intelligent Systems. ROCHI 2022. Learning and Analytics in Intelligent Systems, vol 36. Springer, Cham. https://doi.org/10.1007/978-3-031-53957-2_6

Download citation

DOI: https://doi.org/10.1007/978-3-031-53957-2_6
Published: 10 April 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-53956-5
Online ISBN: 978-3-031-53957-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics