research-article

Detection of Offensive Language and ITS Severity for Low Resource Language

Authors:
Ramsha Saeed

National University of Science and Technology (NUST), Pakistan

National University of Science and Technology (NUST), Pakistan

0000-0002-9504-0368
View Profile

,
Hammad Afzal

National University of Science and Technology (NUST), Pakistan

National University of Science and Technology (NUST), Pakistan

0000-0001-9583-5585
View Profile

,
Sadaf Abdul Rauf

Fatima Jinnah Women University (FJWU), Pakistan

Fatima Jinnah Women University (FJWU), Pakistan

0000-0003-0400-3869
View Profile

,
Naima Iltaf

National University of Science and Technology (NUST), Pakistan

National University of Science and Technology (NUST), Pakistan

0000-0001-5392-5187
View Profile

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 22 Issue 6Article No.: 156pp 1–27https://doi.org/10.1145/3580476

Published:17 June 2023Publication History

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

Continuous proliferation of hate speech in different languages on social media has drawn significant attention from researchers in the past decade. Detecting hate speech is indispensable irrespective of the scale of use of language, as it inflicts huge harm on society. This work presents a first resource for classifying the severity of hate speech in addition to classifying offensive and hate speech content. Current research mostly limits hate speech classification to its primary categories, such as racism, sexism, and hatred of religions. However, hate speech targeted at different protected characteristics also manifests in different forms and intensities. It is important to understand varying severity levels of hate speech so that the most harmful cases of hate speech may be identified and dealt with earlier than the less harmful ones. In this work, we focus on detecting offensive speech, hate speech, and multiple levels of hate speech in the Urdu language. We investigate three primary target categories of hate speech: religion, racism, and national origin. We further divide these categories into levels based on the severity of hate conveyed. The severity levels are referred to as symbolization, insult, and attribution. A corpus comprising more than 20,000 tweets against the corresponding hate speech categories and severity levels is collected and annotated. A comprehensive experimentation scheme is applied using traditional as well as deep learning–based models to examine their impact on hate speech detection. The highest macro-averaged F-score yielded for detecting offensive speech is 86% while the highest F-scores for detecting hate speech with respect to ethnicity, national origin, and religious affiliation are 80%, 81%, and 72%, respectively. This shows that results are very encouraging and would provide a lead towards further investigation in this domain.

REFERENCES

[1] Agarwal Swati and Sureka Ashish. 2016. But I did not mean it! Intent classification of racist posts on Tumblr. In 2016 European Intelligence and Security Informatics Conference (EISIC’16). IEEE, 124–127.Google ScholarCross Ref
[2] Akhter Muhammad Pervez, Jiangbin Zheng, Naqvi Irfan Raza, Abdelmajeed Mohammed, and Sadiq Muhammad Tariq. 2020. Automatic detection of offensive language for Urdu and Roman Urdu. IEEE Access 8 (2020), 91213–91226.Google ScholarCross Ref
[3] Akram Qurat-ul-Ain, Naseer Asma, and Hussain Sarmad. 2009. Assas-Band, an affix-exception-list based Urdu stemmer. In Proceedings of the 7th Workshop on Asian Language Resources. Association for Computational Linguistics, 40–46.Google ScholarDigital Library
[4] Albadi Nuha, Kurdi Maram, and Mishra Shivakant. 2018. Are they our brothers? Analysis and detection of religious hate speech in the Arabic twittersphere. In 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM’18). IEEE, 69–76.Google ScholarCross Ref
[5] Alfina Ika, Mulia Rio, Fanany Mohamad Ivan, and Ekanata Yudo. 2017. Hate speech detection in the Indonesian language: A dataset and preliminary study. In 2017 International Conference on Advanced Computer Science and Information Systems (ICACSIS’17).Google ScholarCross Ref
[6] Anzovino Maria, Fersini Elisabetta, and Rosso Paolo. 2018. Automatic identification and classification of misogynistic language on Twitter. In International Conference on Applications of Natural Language to Information Systems. Springer, 57–64.Google ScholarDigital Library
[7] Badjatiya Pinkesh, Gupta Shashank, Gupta Manish, and Varma Vasudeva. 2017. Deep learning for hate speech detection in tweets. In Proceedings of the 26th International Conference on World Wide Web Companion. International World Wide Web Conferences Steering Committee, 759–760.Google ScholarDigital Library
[8] Bagdon Christopher. 2021. Profiling spreaders of hate speech with N-grams and RoBERTa. In CLEF (Working Notes). 1822–1828.Google Scholar
[9] Zia Haris Bin, Raza Agha Ali, and Athar Awais. 2018. Urdu word segmentation using conditional random fields (CRFs). In Proceedings of the 27th International Conference on Computational Linguistics. Association for Computational Linguistics, 2562–2569. http://aclweb.org/anthology/C18-1217.Google Scholar
[10] Bojanowski Piotr, Grave Edouard, Joulin Armand, and Mikolov Tomas. 2017. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics 5 (2017), 135–146.Google ScholarCross Ref
[11] Bosco Cristina, Felice Dell’Orletta, Poletto Fabio, Sanguinetti Manuela, and Maurizio Tesconi. 2018. Overview of the EVALITA 2018 hate speech detection task. In EVALITA 6th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian, Vol. 2263. CEUR, 1–9.Google Scholar
[12] Burnap Pete and Williams Matthew L.. 2016. Us and them: Identifying cyber hate on Twitter across multiple protected characteristics. EPJ Data Science 5, 1 (2016), 11.Google ScholarCross Ref
[13] Chatzakou Despoina, Kourtellis Nicolas, Blackburn Jeremy, Cristofaro Emiliano De, Stringhini Gianluca, and Vakali Athena. 2017. Mean Birds: Detecting aggression and bullying on Twitter. arXiv preprint arXiv:1702.06877 (2017).Google Scholar
[14] Chavan Vikas S. and Shylaja S. S.. 2015. Machine learning approach for detection of cyber-aggressive comments by peers on social media network. In 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI’15). IEEE, 2354–2358.Google ScholarCross Ref
[15] Chen Hao, Mckeever Susan, and Delany Sarah Jane. 2017. Harnessing the power of text mining for the detection of abusive content in social media. In Advances in Computational Intelligence Systems. Springer, 187–205.Google ScholarCross Ref
[16] Das Mithun, Banerjee Somnath, and Saha Punyajoy. 2021. Abusive and threatening language detection in Urdu using boosting based and BERT based models: A comparative approach. arXiv preprint arXiv:2111.14830 (2021).Google Scholar
[17] Davidson Thomas, Warmsley Dana, Macy Michael, and Weber Ingmar. 2017. Automated hate speech detection and the problem of offensive language. arXiv preprint arXiv:1703.04009 (2017).Google Scholar
[18] Djuric Nemanja, Zhou Jing, Morris Robin, Grbovic Mihajlo, Radosavljevic Vladan, and Bhamidipati Narayan. 2015. Hate speech detection with comment embeddings. In Proceedings of the 24th International Conference on World Wide Web. ACM, 29–30.Google ScholarDigital Library
[19] Eichhorn Kate. 2001. Re-in/citing linguistic injuries: Speech acts, cyberhate, and the spatial and temporal character of networked environments. Computers and Composition 18, 3 (2001), 293–304.Google ScholarCross Ref
[20] Facebook. Hate speech. (2022). Retrieved January 31, 2023 from https://www.facebook.com/communitystandards/hate_speech.Google Scholar
[21] Fortuna Paula and Nunes Sérgio. 2018. A survey on automatic detection of hate speech in text. ACM Computing Surveys 51, 4 (2018), 1–30.Google ScholarDigital Library
[22] Gambäck Björn and Sikdar Utpal Kumar. 2017. Using convolutional neural networks to classify hate-speech. In Proceedings of the 1st Workshop on Abusive Language Online. 85–90.Google ScholarCross Ref
[23] Gao Lei, Kuppersmith Alexis, and Huang Ruihong. 2017. Recognizing explicit and implicit hate speech using a weakly supervised two-path bootstrapping approach. arXiv preprint arXiv:1710.07394 (2017).Google Scholar
[24] Goyal Priya and Kalra Gaganpreet Singh. 2013. Peer-to-peer insult detection in online communities. IITK Unpubl (2013).Google Scholar
[25] Graff Mario, Miranda-Jiménez Sabino, Tellez Eric Sadit, Moctezuma Daniela, Salgado Vladimir, Ortiz-Bejar José, and Sánchez Claudia N.. 2018. INGEOTEC at MEX-A3T: Author profiling and aggressiveness analysis in Twitter using \(\mu\)TC and EvoMSA. In IberEval@ SEPLN. 128–133.Google Scholar
[26] Greevy Edel and Smeaton Alan F.. 2004. Classifying racist texts using a support vector machine. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 468–469.Google ScholarDigital Library
[27] Haque Jahanzaib. 2014. Hate speech: A study of Pakistan’s cyberspace. Islamabad, Pakistan: Bytes4all (2014).Google Scholar
[28] Huang Qianjia, Singh Vivek Kumar, and Atrey Pradeep Kumar. 2014. Cyber bullying detection using social and textual analysis. In Proceedings of the 3rd International Workshop on Socially-Aware Multimedia. 3–6.Google ScholarDigital Library
[29] Jay Timothy and Janschewitz Kristin. 2008. The pragmatics of swearing. Journal of Politeness Research. Language, Behaviour, Culture 4, 2 (2008), 267–288.Google Scholar
[30] Jourová Věra. 2016. Code of conduct on countering illegal hate speech online: First results on implementation. European Commission.[cit. 8. březen 2018] (2016).Google Scholar
[31] Vera Jourová. 2016. Code of Conduct on countering illegal hate speech online: First results on implementation. Factsheet Directorate-General for Justice and Consumers.Google Scholar
[32] Ezgi Kan, Merve Nebioglu, Seyma Özkan, Funda Tekin, and Gamze Tosun. 2018. Media watch on hate speech report January–April 2018. Hrant Dink Foundation.Google Scholar
[33] Khan Muhammad Moin, Shahzad Khurram, and Malik Muhammad Kamran. 2021. Hate speech detection in Roman Urdu. ACM Transactions on Asian and Low-Resource Language Information Processing 20, 1 (2021), 1–19.Google ScholarDigital Library
[34] King Ryan D. and Sutton Gretchen M.. 2013. High times for hate crimes: Explaining the temporal clustering of hate-motivated offending. Criminology 51, 4 (2013), 871–894.Google ScholarCross Ref
[35] Kulmizev Artur, Blankers Bo, Bjerva Johannes, Nissim Malvina, Noord Gertjan van, Plank Barbara, and Wieling Martijn. 2017. The power of character n-grams in native language identification. In Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications. 382–389.Google ScholarCross Ref
[36] Laub Zachary. 2019. Hate speech on social media: Global comparisons. (June2019). Retrieved January 31, 2023 from https://www.cfr.org/backgrounder/hate-speech-social-media-global-comparisons.Google Scholar
[37] Le Quoc and Mikolov Tomas. 2014. Distributed representations of sentences and documents. In International Conference on Machine Learning. 1188–1196.Google ScholarDigital Library
[38] Leets Laura. 2001. Responses to Internet hate sites: Is speech too free in cyberspace? Communication Law & Policy 6, 2 (2001), 287–317.Google ScholarCross Ref
[39] Mandl Thomas, Modha Sandip, Majumder Prasenjit, Patel Daksh, Dave Mohana, Mandlia Chintak, and Patel Aditya. 2019. Overview of the HASOC track at fire 2019: Hate speech and offensive content identification in Indo-European languages. In Proceedings of the 11th Forum for Information Retrieval Evaluation. 14–17.Google ScholarDigital Library
[40] Meddaugh Priscilla Marie and Kay Jack. 2009. Hate speech or “reasonable racism?” The other in Stormfront. Journal of Mass Media Ethics 24, 4 (2009), 251–268.Google ScholarCross Ref
[41] Mikolov Tomas, Chen Kai, Corrado Greg, and Dean Jeffrey. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).Google Scholar
[42] Bastian Birkeneder, Jelena Mitrovic, Julia Niemeier, Leon Teubert, and Siegfried Handschuh. 2018. upInf - Offensive language detection in German tweets. In Proceedings of GermEval 2018, 14th Conference on Natural Language Processing (KONVENS’18).Google Scholar
[43] Mubarak Hamdy, Darwish Kareem, and Magdy Walid. 2017. Abusive language detection on Arabic social media. In Proceedings of the 1st Workshop on Abusive Language Online. 52–56.Google ScholarCross Ref
[44] Nobata Chikashi, Tetreault Joel, Thomas Achint, Mehdad Yashar, and Chang Yi. 2016. Abusive language detection in online user content. In Proceedings of the 25th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 145–153.Google ScholarDigital Library
[45] Oksanen Atte, Hawdon James, Holkeri Emma, Näsi Matti, and Räsänen Pekka. 2014. Exposure to online hate among young social media users. Sociological Studies of Children & Youth 18, 1 (2014), 253–273.Google ScholarCross Ref
[46] Pitsilis Georgios K., Ramampiaro Heri, and Langseth Helge. 2018. Effective hate-speech detection in Twitter data using recurrent neural networks. Applied Intelligence 48, 12 (2018), 4730–4742.Google ScholarDigital Library
[47] Putri Shofianina Dwi Ananda, Ibrohim Muhammad Okky, and Budi Indra. 2021. Abusive language and hate speech detection for Indonesian-local language in social media text. In International Conference on Computing and Information Technology. Springer, 88–98.Google ScholarCross Ref
[48] Ranasinghe Tharindu and Zampieri Marcos. 2021. Multilingual offensive language identification for low-resource languages. Transactions on Asian and Low-Resource Language Information Processing 21, 1 (2021), 1–13.Google ScholarDigital Library
[49] Ministers Council of Europe Committee of. 1997. Recommendation No. R (97) 20 of the Committee of Ministers to member states on “hate speech”. (1997). Retrieved January 31, 2023 from https://rm.coe.int/1680505d5b.Google Scholar
[50] Rizwan Hammad, Shakeel Muhammad Haroon, and Karim Asim. 2020. Hate-speech and offensive language detection in Roman Urdu. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP’20). 2512–2522.Google ScholarCross Ref
[51] Sajid Tauqeer, Hassan Mehdi, Ali Mohsan, and Gillani Rabia. 2020. Roman Urdu multi-class offensive text detection using hybrid features and SVM. In 2020 IEEE 23rd International Multitopic Conference (INMIC’20). IEEE, 1–5.Google ScholarCross Ref
[52] Samghabadi Niloofar Safi, Maharjan Suraj, Sprague Alan, Diaz-Sprague Raquel, and Solorio Thamar. 2017. Detecting nastiness in social media. In Proceedings of the 1st Workshop on Abusive Language Online. 63–72.Google ScholarCross Ref
[53] Twitter. 2020. Hateful conduct policy. (2020). Retrieved January 31, 2023 from https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy.Google Scholar
[54] Vogel Inna and Meghana Meghana. 2021. Profiling hate speech spreaders on Twitter: SVM vs. Bi-LSTM. In CLEF (Working Notes). 2193–2200.Google Scholar
[55] Warner William and Hirschberg Julia. 2012. Detecting hate speech on the World Wide Web. In Proceedings of the 2nd Workshop on Language in Social Media. Association for Computational Linguistics, 19–26.Google ScholarDigital Library
[56] Waseem Zeerak and Hovy Dirk. 2016. Hateful symbols or hateful people? Predictive features for hate speech detection on Twitter. In SRW@ HLT-NAACL. 88–93.Google Scholar
[57] Watanabe Hajime, Bouazizi Mondher, and Ohtsuki Tomoaki. 2018. Hate speech on Twitter: A pragmatic approach to collect hateful and offensive expressions and perform hate speech detection. IEEE Access 6 (2018), 13825–13835.Google ScholarCross Ref
[58] Wiegand Michael, Siegel Melanie, and Ruppenhofer Josef. 2018. Overview of the germeval 2018 shared task on the identification of offensive language. (2018).Google Scholar
[59] YouTube. 2020. Hate speech policy. (2020). Retrieved January 31, 2023 from https://support.google.com/youtube/answer/2801939?hl=en.Google Scholar
[60] Zampieri Marcos, Malmasi Shervin, Nakov Preslav, Rosenthal Sara, Farra Noura, and Kumar Ritesh. 2019. Semeval-2019 task 6: Identifying and categorizing offensive language in social media (offenseval). arXiv preprint arXiv:1903.08983 (2019).Google Scholar
[61] Zhang Ziqi, Robinson David, and Tepper Jonathan. 2018. Detecting hate speech on Twitter using a convolution-GRU based deep neural network. In European Semantic Web Conference. Springer, 745–760.Google ScholarDigital Library
[62] Zhao Rui, Zhou Anna, and Mao Kezhi. 2016. Automatic detection of cyberbullying on social networks based on bullying features. In Proceedings of the 17th International Conference on Distributed Computing and Networking. ACM, 43.Google ScholarDigital Library

Index Terms

Detection of Offensive Language and ITS Severity for Low Resource Language
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing

Recommendations

Hate speech and offensive language detection in Dravidian languages using deep ensemble framework
Abstract
Social networking platforms gained widespread popularity and are used for various activities like: promoting products, sharing news, achievements and many more. On the other hand, it is also used for spreading rumors, bullying people, ...
Highlights
- Proposed a weighted ensemble framework for hate and offensive code-mixed posts identification on social platforms.
Read More
An acoustic model and linguistic analysis for Malayalam disyllabic words: a low resource language
Abstract
Automatic Speech Recognition (ASR) has reaped a lot of attention in recent years. Despite the recent advancements in ASR, the potential for extracting the raw features from speech remains lacking. This paper proposes an Automatic Speech ...
Read More
Label modification and bootstrapping for zero-shot cross-lingual hate speech detection
Abstract
The goal of hate speech detection is to filter negative online content aiming at certain groups of people. Due to the easy accessibility and multilinguality of social media platforms, it is crucial to protect everyone which requires building hate ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Asian and Low-Resource Language Information Processing Volume 22, Issue 6
June 2023
635 pages
ISSN:2375-4699
EISSN:2375-4702
DOI:10.1145/3604597
Editor:
Imed Zitouni
Google, USA
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 June 2023
- Online AM: 19 January 2023
- Accepted: 6 January 2023
- Revised: 6 November 2022
- Received: 27 April 2022
Published in tallip Volume 22, Issue 6

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Hate speech
long short-term memory
Urdu NLP
convolutional neural network
BERT
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 515
  Total Downloads
- Downloads (Last 12 months)296
- Downloads (Last 6 weeks)27
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

Detection of Offensive Language and ITS Severity for Low Resource Language

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Hate speech and offensive language detection in Dravidian languages using deep ensemble framework

An acoustic model and linguistic analysis for Malayalam disyllabic words: a low resource language

Label modification and bootstrapping for zero-shot cross-lingual hate speech detection

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Full Text

Caption

Detection of Offensive Language and ITS Severity for Low Resource Language

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Hate speech and offensive language detection in Dravidian languages using deep ensemble framework

An acoustic model and linguistic analysis for Malayalam disyllabic words: a low resource language

Label modification and bootstrapping for zero-shot cross-lingual hate speech detection

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Full Text

Share this Publication link

Share on Social Media