Offensive language and hate speech detection using deep learning in football news live streaming chat on YouTube in Thailand

Pookpanich, Peerat; Siriborvornratanakul, Thitirat

doi:10.1007/s13278-023-01183-9

Offensive language and hate speech detection using deep learning in football news live streaming chat on YouTube in Thailand

Case Report
Published: 03 January 2024

Volume 14, article number 18, (2024)
Cite this article

Social Network Analysis and Mining Aims and scope Submit manuscript

Peerat Pookpanich¹ &
Thitirat Siriborvornratanakul¹

290 Accesses
1 Citation
Explore all metrics

Abstract

Today, hate speech is frequently seen on Thai social media platforms such as Facebook, Twitter, and even online video platforms such as YouTube. In live video broadcasts of football news, for example, some Thais expressed hate speech toward opposing football fans and players. This paper presented offensive language and hate speech detection for Thai in YouTube live streaming chat with transformer-based language models by using five BERT models, including BERT, XLM-RoBERTa, DistilBERT, WangchanBERTa, and TwHIN-BERT, which were trained with multilingual languages as well as Thai. In the data labeling process, a two-step data labeling procedure was developed. The first stage involved automated data labeling utilizing the WangchanBERTa model, and the second stage involved manual data labeling conducted by the researchers. We developed text classification models using 11 different positive and negative class ratio datasets to get the most efficient model. In terms of recall and F1 score, the results showed that XLM-RoBERTa performed the best. It yielded an average recall and F1 score of 0.9669 and 0.9530, respectively. However, neither of the five models has significantly different performance. When considering the purpose of the application, DistilBERT is most appropriate. Because of its similar performance to XLM-RoBERTa, it has smaller model sizes and works faster.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fake news, disinformation and misinformation in social media: a review

Article 09 February 2023

Detection and moderation of detrimental content on social media platforms: current status and future directions

Article 05 September 2022

FakeBERT: Fake news detection in social media with a BERT-based deep learning approach

Article 07 January 2021

Data availability

The data that support the findings of this study are not openly available due to reasons of sensitivity and are available from the corresponding author upon reasonable request. Data are located in local computer of researcher.

References

Conneau A, Khandelwal K, Goyal N, Chaudhary V, Wenzek G, Guzmán F, Grave E, Ott M, Zettlemoyer L, Stoyanov V (2019) Unsupervised cross-lingual representation learning at scale. arXiv preprint arXiv:1911.02116
Devlin J, Chang MW, Lee K, Toutanova K (2018) BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805
Digital 2022: THAILAND: https://datareportal.com/reports/digital-2022-thailand, last accessed 2023/01/15
Dou Y, Forbes M, Koncel-Kedziorski R, Smith NA, Choi Y (2021) Is GPT-3 text indistinguishable from human text? SCARECROW: A framework for scrutinizing machine text. arXiv preprint arXiv:2107.01294
Gao Z, Yada S, Wakamiya S, Aramaki E (2020) Offensive language detection on video live streaming chat. In: Proceedings of the 28th international conference on computational linguistics, pp 1936–1940
Gashroo OB, Mehrotra M (2022) Analysis and classification of abusive textual content detection in online social media. In intelligent communication technologies and virtual mobile networks. In: Proceedings of ICICV 2022, Springer, Singapore, pp 173–190
Gilardi F, Alizadeh M, Kubli M (2023) ChatGPT outperforms crowd-workers for text-annotation tasks. arXiv preprint arXiv:2303.15056
Hamdy E (2021) Neural Models for Offensive Language Detection. arXiv preprint arXiv:2106.14609
https://dictionary.cambridge.org/dictionary/english/hate-speech, last accessed 2023/08/14
https://www.lawinsider.com/dictionary/offensive-language, last accessed 2023/08/14
Kaur S, Singh S, Kaushal S (2021) Abusive content detection in online user-generated data: a survey. Procedia Comput Sci 189:274–281
Article Google Scholar
Kovács G, Alonso P, Saini R (2021) Challenges of hate speech detection in social media: data scarcity, and leveraging external resources. SN Comput Sci 2:1–15
Article Google Scholar
Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) RoBERTa: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692
Lowphansirikul L, Polpanumas C, Jantrakulchai N, Nutanong S (2021) WangchanBERTa: Pretraining transformer-based Thai language models. arXiv preprint arXiv:2101.09635
Mnassri K, Rajapaksha P, Farahbakhsh R, Crespi N (2023) Hate speech and offensive language detection using an emotion-aware shared encoder. arXiv preprint arXiv:2302.08777
Panchala GH, Sasank VVS, Adidela DRH, Yellamma P, Ashesh K, Prasad C (2022) Hate speech & offensive language detection using ML &NLP. In: 2022 4th international conference on smart systems and inventive technology (ICSSIT), pp 1262–1268, IEEE
Pasupa K, Karnbanjob W, Aksornsiri M (2022) Hate speech detection in Thai social media with ordinal-imbalanced text classification. In: 2022 19th international joint conference on computer science and software engineering (JCSSE), pp 1–6, IEEE
Sanh V, Debut L, Chaumond J, Wolf T (2019) DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108
Wanasukapunt R, Phimoltares S (2021) Classification of abusive Thai language content in social media using deep learning. In: 2021 18th international joint conference on computer science and software engineering (JCSSE), pp 1–6, IEEE
Wei B, Li J, Gupta A, Umair H, Vovor A, Durzynski N (2021) Offensive language and hate speech detection with deep learning and transfer learning. arXiv preprint arXiv:2108.03305
Yadav AK, Kumar M, Kumar A, Shivani K, Yadav D (2023) Hate speech recognition in multilingual text: hinglish documents. Int J Inf Technol 15(3):1319–1331
Google Scholar
Yang Z, Dai Z, Yang Y, Carbonell J, Salakhutdinov RR, Le QV (2019) XLNet: generalized autoregressive pretraining for language understanding. Advances in neural information processing systems, vol 32
Zhang S, Jafari O, Nagarkar P (2021) A survey on machine learning techniques for auto labeling of video, audio, and text data. arXiv preprint arXiv:2109.03784
Zhang X, Malkov Y, Florez O, Park S, McWilliams B, Han J, El-Kishky A (2023) TwHIN-BERT: a socially-enriched pre-trained language model for multilingual tweet representations at twitter. In: Proceedings of the 29th ACM SIGKDD conference on knowledge discovery and data mining, pp 5597–5607

Download references

Funding

No funding.

Author information

Authors and Affiliations

Graduate School of Applied Statistics, National Institute of Development Administration, Bangkok, 10240, Thailand
Peerat Pookpanich & Thitirat Siriborvornratanakul

Authors

Peerat Pookpanich
View author publications
You can also search for this author in PubMed Google Scholar
Thitirat Siriborvornratanakul
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed equally to this manuscript.

Corresponding author

Correspondence to Thitirat Siriborvornratanakul.

Ethics declarations

Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Pookpanich, P., Siriborvornratanakul, T. Offensive language and hate speech detection using deep learning in football news live streaming chat on YouTube in Thailand. Soc. Netw. Anal. Min. 14, 18 (2024). https://doi.org/10.1007/s13278-023-01183-9

Download citation

Received: 21 September 2023
Accepted: 06 December 2023
Published: 03 January 2024
DOI: https://doi.org/10.1007/s13278-023-01183-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Offensive language and hate speech detection using deep learning in football news live streaming chat on YouTube in Thailand

Abstract

Access this article

Similar content being viewed by others

Fake news, disinformation and misinformation in social media: a review

Detection and moderation of detrimental content on social media platforms: current status and future directions

FakeBERT: Fake news detection in social media with a BERT-based deep learning approach

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Offensive language and hate speech detection using deep learning in football news live streaming chat on YouTube in Thailand

Abstract

Access this article

Similar content being viewed by others

Fake news, disinformation and misinformation in social media: a review

Detection and moderation of detrimental content on social media platforms: current status and future directions

FakeBERT: Fake news detection in social media with a BERT-based deep learning approach

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation