skip to main content
research-article

Detection of Offensive Language and ITS Severity for Low Resource Language

Published:17 June 2023Publication History
Skip Abstract Section

Abstract

Continuous proliferation of hate speech in different languages on social media has drawn significant attention from researchers in the past decade. Detecting hate speech is indispensable irrespective of the scale of use of language, as it inflicts huge harm on society. This work presents a first resource for classifying the severity of hate speech in addition to classifying offensive and hate speech content. Current research mostly limits hate speech classification to its primary categories, such as racism, sexism, and hatred of religions. However, hate speech targeted at different protected characteristics also manifests in different forms and intensities. It is important to understand varying severity levels of hate speech so that the most harmful cases of hate speech may be identified and dealt with earlier than the less harmful ones. In this work, we focus on detecting offensive speech, hate speech, and multiple levels of hate speech in the Urdu language. We investigate three primary target categories of hate speech: religion, racism, and national origin. We further divide these categories into levels based on the severity of hate conveyed. The severity levels are referred to as symbolization, insult, and attribution. A corpus comprising more than 20,000 tweets against the corresponding hate speech categories and severity levels is collected and annotated. A comprehensive experimentation scheme is applied using traditional as well as deep learning–based models to examine their impact on hate speech detection. The highest macro-averaged F-score yielded for detecting offensive speech is 86% while the highest F-scores for detecting hate speech with respect to ethnicity, national origin, and religious affiliation are 80%, 81%, and 72%, respectively. This shows that results are very encouraging and would provide a lead towards further investigation in this domain.

REFERENCES

  1. [1] Agarwal Swati and Sureka Ashish. 2016. But I did not mean it! Intent classification of racist posts on Tumblr. In 2016 European Intelligence and Security Informatics Conference (EISIC’16). IEEE, 124127.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Akhter Muhammad Pervez, Jiangbin Zheng, Naqvi Irfan Raza, Abdelmajeed Mohammed, and Sadiq Muhammad Tariq. 2020. Automatic detection of offensive language for Urdu and Roman Urdu. IEEE Access 8 (2020), 9121391226.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Akram Qurat-ul-Ain, Naseer Asma, and Hussain Sarmad. 2009. Assas-Band, an affix-exception-list based Urdu stemmer. In Proceedings of the 7th Workshop on Asian Language Resources. Association for Computational Linguistics, 4046.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] Albadi Nuha, Kurdi Maram, and Mishra Shivakant. 2018. Are they our brothers? Analysis and detection of religious hate speech in the Arabic twittersphere. In 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM’18). IEEE, 6976.Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Alfina Ika, Mulia Rio, Fanany Mohamad Ivan, and Ekanata Yudo. 2017. Hate speech detection in the Indonesian language: A dataset and preliminary study. In 2017 International Conference on Advanced Computer Science and Information Systems (ICACSIS’17).Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Anzovino Maria, Fersini Elisabetta, and Rosso Paolo. 2018. Automatic identification and classification of misogynistic language on Twitter. In International Conference on Applications of Natural Language to Information Systems. Springer, 5764.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. [7] Badjatiya Pinkesh, Gupta Shashank, Gupta Manish, and Varma Vasudeva. 2017. Deep learning for hate speech detection in tweets. In Proceedings of the 26th International Conference on World Wide Web Companion. International World Wide Web Conferences Steering Committee, 759760.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Bagdon Christopher. 2021. Profiling spreaders of hate speech with N-grams and RoBERTa. In CLEF (Working Notes). 18221828.Google ScholarGoogle Scholar
  9. [9] Zia Haris Bin, Raza Agha Ali, and Athar Awais. 2018. Urdu word segmentation using conditional random fields (CRFs). In Proceedings of the 27th International Conference on Computational Linguistics. Association for Computational Linguistics, 25622569. http://aclweb.org/anthology/C18-1217.Google ScholarGoogle Scholar
  10. [10] Bojanowski Piotr, Grave Edouard, Joulin Armand, and Mikolov Tomas. 2017. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics 5 (2017), 135146.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Bosco Cristina, Felice Dell’Orletta, Poletto Fabio, Sanguinetti Manuela, and Maurizio Tesconi. 2018. Overview of the EVALITA 2018 hate speech detection task. In EVALITA 6th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian, Vol. 2263. CEUR, 19.Google ScholarGoogle Scholar
  12. [12] Burnap Pete and Williams Matthew L.. 2016. Us and them: Identifying cyber hate on Twitter across multiple protected characteristics. EPJ Data Science 5, 1 (2016), 11.Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Chatzakou Despoina, Kourtellis Nicolas, Blackburn Jeremy, Cristofaro Emiliano De, Stringhini Gianluca, and Vakali Athena. 2017. Mean Birds: Detecting aggression and bullying on Twitter. arXiv preprint arXiv:1702.06877 (2017).Google ScholarGoogle Scholar
  14. [14] Chavan Vikas S. and Shylaja S. S.. 2015. Machine learning approach for detection of cyber-aggressive comments by peers on social media network. In 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI’15). IEEE, 23542358.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Chen Hao, Mckeever Susan, and Delany Sarah Jane. 2017. Harnessing the power of text mining for the detection of abusive content in social media. In Advances in Computational Intelligence Systems. Springer, 187205.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Das Mithun, Banerjee Somnath, and Saha Punyajoy. 2021. Abusive and threatening language detection in Urdu using boosting based and BERT based models: A comparative approach. arXiv preprint arXiv:2111.14830 (2021).Google ScholarGoogle Scholar
  17. [17] Davidson Thomas, Warmsley Dana, Macy Michael, and Weber Ingmar. 2017. Automated hate speech detection and the problem of offensive language. arXiv preprint arXiv:1703.04009 (2017).Google ScholarGoogle Scholar
  18. [18] Djuric Nemanja, Zhou Jing, Morris Robin, Grbovic Mihajlo, Radosavljevic Vladan, and Bhamidipati Narayan. 2015. Hate speech detection with comment embeddings. In Proceedings of the 24th International Conference on World Wide Web. ACM, 2930.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] Eichhorn Kate. 2001. Re-in/citing linguistic injuries: Speech acts, cyberhate, and the spatial and temporal character of networked environments. Computers and Composition 18, 3 (2001), 293304.Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Facebook. Hate speech. (2022). Retrieved January 31, 2023 from https://www.facebook.com/communitystandards/hate_speech.Google ScholarGoogle Scholar
  21. [21] Fortuna Paula and Nunes Sérgio. 2018. A survey on automatic detection of hate speech in text. ACM Computing Surveys 51, 4 (2018), 130.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Gambäck Björn and Sikdar Utpal Kumar. 2017. Using convolutional neural networks to classify hate-speech. In Proceedings of the 1st Workshop on Abusive Language Online. 8590.Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Gao Lei, Kuppersmith Alexis, and Huang Ruihong. 2017. Recognizing explicit and implicit hate speech using a weakly supervised two-path bootstrapping approach. arXiv preprint arXiv:1710.07394 (2017).Google ScholarGoogle Scholar
  24. [24] Goyal Priya and Kalra Gaganpreet Singh. 2013. Peer-to-peer insult detection in online communities. IITK Unpubl (2013).Google ScholarGoogle Scholar
  25. [25] Graff Mario, Miranda-Jiménez Sabino, Tellez Eric Sadit, Moctezuma Daniela, Salgado Vladimir, Ortiz-Bejar José, and Sánchez Claudia N.. 2018. INGEOTEC at MEX-A3T: Author profiling and aggressiveness analysis in Twitter using \(\mu\)TC and EvoMSA. In IberEval@ SEPLN. 128133.Google ScholarGoogle Scholar
  26. [26] Greevy Edel and Smeaton Alan F.. 2004. Classifying racist texts using a support vector machine. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 468469.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] Haque Jahanzaib. 2014. Hate speech: A study of Pakistan’s cyberspace. Islamabad, Pakistan: Bytes4all (2014).Google ScholarGoogle Scholar
  28. [28] Huang Qianjia, Singh Vivek Kumar, and Atrey Pradeep Kumar. 2014. Cyber bullying detection using social and textual analysis. In Proceedings of the 3rd International Workshop on Socially-Aware Multimedia. 36.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] Jay Timothy and Janschewitz Kristin. 2008. The pragmatics of swearing. Journal of Politeness Research. Language, Behaviour, Culture 4, 2 (2008), 267288.Google ScholarGoogle Scholar
  30. [30] Jourová Věra. 2016. Code of conduct on countering illegal hate speech online: First results on implementation. European Commission.[cit. 8. březen 2018] (2016).Google ScholarGoogle Scholar
  31. [31] Vera Jourová. 2016. Code of Conduct on countering illegal hate speech online: First results on implementation. Factsheet Directorate-General for Justice and Consumers.Google ScholarGoogle Scholar
  32. [32] Ezgi Kan, Merve Nebioglu, Seyma Özkan, Funda Tekin, and Gamze Tosun. 2018. Media watch on hate speech report January–April 2018. Hrant Dink Foundation.Google ScholarGoogle Scholar
  33. [33] Khan Muhammad Moin, Shahzad Khurram, and Malik Muhammad Kamran. 2021. Hate speech detection in Roman Urdu. ACM Transactions on Asian and Low-Resource Language Information Processing 20, 1 (2021), 119.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] King Ryan D. and Sutton Gretchen M.. 2013. High times for hate crimes: Explaining the temporal clustering of hate-motivated offending. Criminology 51, 4 (2013), 871894.Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] Kulmizev Artur, Blankers Bo, Bjerva Johannes, Nissim Malvina, Noord Gertjan van, Plank Barbara, and Wieling Martijn. 2017. The power of character n-grams in native language identification. In Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications. 382389.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Laub Zachary. 2019. Hate speech on social media: Global comparisons. (June2019). Retrieved January 31, 2023 from https://www.cfr.org/backgrounder/hate-speech-social-media-global-comparisons.Google ScholarGoogle Scholar
  37. [37] Le Quoc and Mikolov Tomas. 2014. Distributed representations of sentences and documents. In International Conference on Machine Learning. 11881196.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. [38] Leets Laura. 2001. Responses to Internet hate sites: Is speech too free in cyberspace? Communication Law & Policy 6, 2 (2001), 287317.Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Mandl Thomas, Modha Sandip, Majumder Prasenjit, Patel Daksh, Dave Mohana, Mandlia Chintak, and Patel Aditya. 2019. Overview of the HASOC track at fire 2019: Hate speech and offensive content identification in Indo-European languages. In Proceedings of the 11th Forum for Information Retrieval Evaluation. 1417.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. [40] Meddaugh Priscilla Marie and Kay Jack. 2009. Hate speech or “reasonable racism?” The other in Stormfront. Journal of Mass Media Ethics 24, 4 (2009), 251268.Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Mikolov Tomas, Chen Kai, Corrado Greg, and Dean Jeffrey. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).Google ScholarGoogle Scholar
  42. [42] Bastian Birkeneder, Jelena Mitrovic, Julia Niemeier, Leon Teubert, and Siegfried Handschuh. 2018. upInf - Offensive language detection in German tweets. In Proceedings of GermEval 2018, 14th Conference on Natural Language Processing (KONVENS’18).Google ScholarGoogle Scholar
  43. [43] Mubarak Hamdy, Darwish Kareem, and Magdy Walid. 2017. Abusive language detection on Arabic social media. In Proceedings of the 1st Workshop on Abusive Language Online. 5256.Google ScholarGoogle ScholarCross RefCross Ref
  44. [44] Nobata Chikashi, Tetreault Joel, Thomas Achint, Mehdad Yashar, and Chang Yi. 2016. Abusive language detection in online user content. In Proceedings of the 25th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 145153.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. [45] Oksanen Atte, Hawdon James, Holkeri Emma, Näsi Matti, and Räsänen Pekka. 2014. Exposure to online hate among young social media users. Sociological Studies of Children & Youth 18, 1 (2014), 253273.Google ScholarGoogle ScholarCross RefCross Ref
  46. [46] Pitsilis Georgios K., Ramampiaro Heri, and Langseth Helge. 2018. Effective hate-speech detection in Twitter data using recurrent neural networks. Applied Intelligence 48, 12 (2018), 47304742.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. [47] Putri Shofianina Dwi Ananda, Ibrohim Muhammad Okky, and Budi Indra. 2021. Abusive language and hate speech detection for Indonesian-local language in social media text. In International Conference on Computing and Information Technology. Springer, 8898.Google ScholarGoogle ScholarCross RefCross Ref
  48. [48] Ranasinghe Tharindu and Zampieri Marcos. 2021. Multilingual offensive language identification for low-resource languages. Transactions on Asian and Low-Resource Language Information Processing 21, 1 (2021), 113.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. [49] Ministers Council of Europe Committee of. 1997. Recommendation No. R (97) 20 of the Committee of Ministers to member states on “hate speech”. (1997). Retrieved January 31, 2023 from https://rm.coe.int/1680505d5b.Google ScholarGoogle Scholar
  50. [50] Rizwan Hammad, Shakeel Muhammad Haroon, and Karim Asim. 2020. Hate-speech and offensive language detection in Roman Urdu. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP’20). 25122522.Google ScholarGoogle ScholarCross RefCross Ref
  51. [51] Sajid Tauqeer, Hassan Mehdi, Ali Mohsan, and Gillani Rabia. 2020. Roman Urdu multi-class offensive text detection using hybrid features and SVM. In 2020 IEEE 23rd International Multitopic Conference (INMIC’20). IEEE, 15.Google ScholarGoogle ScholarCross RefCross Ref
  52. [52] Samghabadi Niloofar Safi, Maharjan Suraj, Sprague Alan, Diaz-Sprague Raquel, and Solorio Thamar. 2017. Detecting nastiness in social media. In Proceedings of the 1st Workshop on Abusive Language Online. 6372.Google ScholarGoogle ScholarCross RefCross Ref
  53. [53] Twitter. 2020. Hateful conduct policy. (2020). Retrieved January 31, 2023 from https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy.Google ScholarGoogle Scholar
  54. [54] Vogel Inna and Meghana Meghana. 2021. Profiling hate speech spreaders on Twitter: SVM vs. Bi-LSTM. In CLEF (Working Notes). 21932200.Google ScholarGoogle Scholar
  55. [55] Warner William and Hirschberg Julia. 2012. Detecting hate speech on the World Wide Web. In Proceedings of the 2nd Workshop on Language in Social Media. Association for Computational Linguistics, 1926.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. [56] Waseem Zeerak and Hovy Dirk. 2016. Hateful symbols or hateful people? Predictive features for hate speech detection on Twitter. In SRW@ HLT-NAACL. 8893.Google ScholarGoogle Scholar
  57. [57] Watanabe Hajime, Bouazizi Mondher, and Ohtsuki Tomoaki. 2018. Hate speech on Twitter: A pragmatic approach to collect hateful and offensive expressions and perform hate speech detection. IEEE Access 6 (2018), 1382513835.Google ScholarGoogle ScholarCross RefCross Ref
  58. [58] Wiegand Michael, Siegel Melanie, and Ruppenhofer Josef. 2018. Overview of the germeval 2018 shared task on the identification of offensive language. (2018).Google ScholarGoogle Scholar
  59. [59] YouTube. 2020. Hate speech policy. (2020). Retrieved January 31, 2023 from https://support.google.com/youtube/answer/2801939?hl=en.Google ScholarGoogle Scholar
  60. [60] Zampieri Marcos, Malmasi Shervin, Nakov Preslav, Rosenthal Sara, Farra Noura, and Kumar Ritesh. 2019. Semeval-2019 task 6: Identifying and categorizing offensive language in social media (offenseval). arXiv preprint arXiv:1903.08983 (2019).Google ScholarGoogle Scholar
  61. [61] Zhang Ziqi, Robinson David, and Tepper Jonathan. 2018. Detecting hate speech on Twitter using a convolution-GRU based deep neural network. In European Semantic Web Conference. Springer, 745760.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. [62] Zhao Rui, Zhou Anna, and Mao Kezhi. 2016. Automatic detection of cyberbullying on social networks based on bullying features. In Proceedings of the 17th International Conference on Distributed Computing and Networking. ACM, 43.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Detection of Offensive Language and ITS Severity for Low Resource Language

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 22, Issue 6
      June 2023
      635 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3604597
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 17 June 2023
      • Online AM: 19 January 2023
      • Accepted: 6 January 2023
      • Revised: 6 November 2022
      • Received: 27 April 2022
      Published in tallip Volume 22, Issue 6

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text