PEACE: Cross-Platform Hate Speech Detection - A Causality-Guided Framework

Sheth, Paaras; Kumarage, Tharindu; Moraffah, Raha; Chadha, Aman; Liu, Huan

doi:10.1007/978-3-031-43412-9_33

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14169))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

1320 Accesses
1 Citations

Abstract

Hate speech detection refers to the task of detecting hateful content that aims at denigrating an individual or a group based on their religion, gender, sexual orientation, or other characteristics. Due to the different policies of the platforms, different groups of people express hate in different ways. Furthermore, due to the lack of labeled data in some platforms it becomes challenging to build hate speech detection models. To this end, we revisit if we can learn a generalizable hate speech detection model for the cross platform setting, where we train the model on the data from one (source) platform and generalize the model across multiple (target) platforms. Existing generalization models rely on linguistic cues or auxiliary information, making them biased towards certain tags or certain kinds of words (e.g., abusive words) on the source platform and thus not applicable to the target platforms. Inspired by social and psychological theories, we endeavor to explore if there exist inherent causal cues that can be leveraged to learn generalizable representations for detecting hate speech across these distribution shifts. To this end, we propose a causality-guided framework, PEACE, that identifies and leverages two intrinsic causal cues omnipresent in hateful content: the overall sentiment and the aggression in the text. We conduct extensive experiments across multiple platforms (representing the distribution shift) showing if causal cues can help cross-platform generalization.

P. Sheth and T. Kumarage—Both authors contributed equally. A. Chadha—Work does not relate to the position at Amazon.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The code for PEACE can be accessed from: https://github.com/paras2612/PEACE.

References

Ali, R., Farooq, U., Arshad, U., Shahzad, W., Beg, M.O.: Hate speech detection on twitter using transfer learning. Comput. Speech Lang. 74, 101365 (2022)
Article Google Scholar
Alkomah, F., Ma, X.: A literature review of textual hate speech detection methods and datasets. Information 13(6), 273 (2022)
Article Google Scholar
Aroyehun, S.T., Gelbukh, A.: Aggression detection in social media: using deep neural networks, data augmentation, and pseudo labeling. In: Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018), pp. 90–97 (2018)
Google Scholar
Bauwelinck, N., Lefever, E.: Measuring the impact of sentiment for hate speech detection on Twitter. Proc. HUSO, 17–22 (2019)
Google Scholar
Bühlmann, P.: Invariance, causality and robustness. Stat. Sci. (2020)
Google Scholar
Caselli, T., Basile, V., Mitrović, J., Granitzer, M.: HateBERT: retraining BERT for abusive language detection in english. arXiv preprint arXiv:2010.12472 (2020)
Clark, K., Khandelwal, U., Levy, O., Manning, C.D.: What does BERT look at? An analysis of BERT’s attention. In: Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pp. 276–286 (2019)
Google Scholar
Corazza, M., Menini, S., Cabrio, E., Tonelli, S., Villata, S.: Cross-platform evaluation for Italian hate speech detection. In: CLiC-it 2019–6th Annual Conference of the Italian Association for Computational Linguistics (2019)
Google Scholar
Craig, K.M.: Examining hate-motivated aggression: a review of the social psychological literature on hate crimes as a distinct form of aggression. Aggress. Violent. Beh. 7(1), 85–101 (2002)
Article Google Scholar
Del Vigna, F., Cimino, A., Dell’Orletta, F., Petrocchi, M., Tesconi, M.: Hate me, hate me not: hate speech detection on Facebook. In: Proceedings of the first Italian conference on cybersecurity (ITASEC 2017), pp. 86–95 (2017)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
ElSherief, M., Kulkarni, V., Nguyen, D., Wang, W.Y., Belding, E.: Hate lingo: a target-based linguistic analysis of hate speech in social media. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 12 (2018)
Google Scholar
Fortuna, P., Nunes, S.: A survey on automatic detection of hate speech in text. ACM Comput. Surv. (CSUR) 51(4), 1–30 (2018)
Article Google Scholar
Gitari, N.D., Zuping, Z., Damien, H., Long, J.: A lexicon-based approach for hate speech detection. Int. J. Multimedia Ubiquit. Eng. 10(4), 215–230 (2015)
Article Google Scholar
Kennedy, B., et al.: The gab hate corpus: a collection of 27k posts annotated for hate speech. PsyArXiv. 18 (2018)
Google Scholar
Kennedy, C.J., Bacon, G., Sahn, A., von Vacano, C.: Constructing interval variables via faceted Rasch measurement and multitask deep learning: a hate speech application. arXiv preprint arXiv:2009.10277 (2020)
Kim, Y., Park, S., Han, Y.S.: Generalizable implicit hate speech detection using contrastive learning. In: Proceedings of the 29th International Conference on Computational Linguistics, pp. 6667–6679 (2022)
Google Scholar
Krahé, B.: The Social Psychology of Aggression. Routledge (2020)
Google Scholar
Laub, Z.: Hate speech on social media: global comparisons. Counc. Foreign Relat. 7 (2019)
Google Scholar
Ljubešić, N., Fišer, D., Erjavec, T.: The FRENK datasets of socially unacceptable discourse in Slovene and English. In: Ekštein, K. (ed.) TSD 2019. LNCS (LNAI), vol. 11697, pp. 103–114. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-27947-9_9
Chapter Google Scholar
MacAvaney, S., Yao, H.R., Yang, E., Russell, K., Goharian, N., Frieder, O.: Hate speech detection: challenges and solutions. PLoS ONE 14(8), e0221152 (2019)
Article Google Scholar
Markov, I., Ljubešić, N., Fišer, D., Daelemans, W.: Exploring stylometric and emotion-based features for multilingual cross-domain hate speech detection. In: Proceedings of the Eleventh Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp. 149–159 (2021)
Google Scholar
Mathew, B., Saha, P., Yimam, S.M., Biemann, C., Goyal, P., Mukherjee, A.: HateXplain: a benchmark dataset for explainable hate speech detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 14867–14875 (2021)
Google Scholar
Mazari, A.C., Boudoukhani, N., Djeffal, A.: BERT-based ensemble learning for multi-aspect hate speech detection. Cluster Comput., 1–15 (2023)
Google Scholar
Mohammad, S.M., Turney, P.D.: Crowdsourcing a word-emotion association lexicon. Comput. Intell. 29(3), 436–465 (2013)
Article MathSciNet Google Scholar
Pamungkas, E.W., Basile, V., Patti, V.: A joint learning approach with knowledge injection for zero-shot cross-lingual hate speech detection. Inf. Process. Manag. 58(4), 102544 (2021)
Article Google Scholar
Paz, M.A., Montero-Díaz, J., Moreno-Delgado, A.: Hate speech: a systematized review. SAGE Open 10(4), 2158244020973022 (2020)
Article Google Scholar
Qian, J., Bethke, A., Liu, Y., Belding, E., Wang, W.Y.: A benchmark dataset for learning to intervene in online hate speech. arXiv preprint arXiv:1909.04251 (2019)
Ramponi, A., Tonelli, S.: Features or spurious artifacts? Data-centric baselines for fair and robust hate speech detection. In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 3027–3040. Association for Computational Linguistics, Seattle, United States, July 2022
Google Scholar
Rodriguez, A., Argueta, C., Chen, Y.L.: Automatic detection of hate speech on Facebook using sentiment and emotion analysis. In: 2019 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), pp. 169–174. IEEE (2019)
Google Scholar
Rösner, L., Krämer, N.C.: Verbal venting in the social web: effects of anonymity and group norms on aggressive language use in online comments. Soc. Media+ Soc. 2, 2056305116664220 (2016)
Google Scholar
Roy, S.G., Narayan, U., Raha, T., Abid, Z., Varma, V.: Leveraging multilingual transformers for hate speech detection. arXiv preprint arXiv:2101.03207 (2021)
Schmidt, A., Wiegand, M.: A survey on hate speech detection using natural language processing. In: Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media, pp. 1–10 (2017)
Google Scholar
Sengupta, A., Bhattacharjee, S.K., Akhtar, M.S., Chakraborty, T.: Does aggression lead to hate? Detecting and reasoning offensive traits in Hinglish code-mixed texts. Neurocomputing 488, 598–617 (2022)
Article Google Scholar
Tamkin, A., Singh, T., Giovanardi, D., Goodman, N.: Investigating transferability in pretrained language models. arXiv preprint arXiv:2004.14975 (2020)
del Valle-Cano, G., Quijano-Sánchez, L., Liberatore, F., Gómez, J.: SocialHaterBERT: a dichotomous approach for automatically detecting hate speech on twitter through textual analysis and user profiles. Exp. Syst. Appl. 216, 119446 (2023)
Article Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Wiegand, M., Ruppenhofer, J., Schmidt, A., Greenberg, C.: Inducing a lexicon of abusive words-a feature-based approach. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 1046–1056 (2018)
Google Scholar
Williams, M.L., Burnap, P., Javed, A., Liu, H., Ozalp, S.: Hate in the machine: anti-black and anti-muslim social media posts as predictors of offline racially and religiously aggravated crime. Br. J. Criminol. 60(1), 93–117 (2020)
Article Google Scholar
Wulczyn, E., Thain, N., Dixon, L.: Ex machina: personal attacks seen at scale. In: Proceedings of the 26th International Conference on World Wide Web, pp. 1391–1399 (2017)
Google Scholar
Yin, W., Agarwal, V., Jiang, A., Zubiaga, A., Sastry, N.: AnnoBERT: effectively representing multiple annotators’ label choices to improve hate speech detection. arXiv preprint arXiv:2212.10405 (2022)
Yin, W., Zubiaga, A.: Towards generalisable hate speech detection: a review on obstacles and solutions. PeerJ Comput. Sci. 7, e598 (2021)
Article Google Scholar
Yue, L., Chen, W., Li, X., Zuo, W., Yin, M.: A survey of sentiment analysis in social media. Knowl. Inf. Syst. 60, 617–663 (2019)
Article Google Scholar
Zhou, X., et al.: Hate speech detection based on sentiment knowledge sharing. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 7158–7166 (2021)
Google Scholar

Download references

Acknowledgements

This material is based upon work supported by, or in part by the Office of Naval Research (ONR) under contract/grant number N00014-21-1-4002, the Army Research Office under the grant number W911NF2110030, and Defense Advanced Research Projects Agency (DARPA) under the grant number HR001120C0123. The views, opinions and/or findings expressed are those of the author and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government.

Author information

Authors and Affiliations

Arizona State University, Tempe, AZ, USA
Paaras Sheth, Tharindu Kumarage, Raha Moraffah & Huan Liu
Stanford University, Stanford, CA, USA
Aman Chadha
Amazon Alexa AI, Sunnyvale, CA, USA
Aman Chadha

Authors

Paaras Sheth
View author publications
You can also search for this author in PubMed Google Scholar
Tharindu Kumarage
View author publications
You can also search for this author in PubMed Google Scholar
Raha Moraffah
View author publications
You can also search for this author in PubMed Google Scholar
Aman Chadha
View author publications
You can also search for this author in PubMed Google Scholar
Huan Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Paaras Sheth .

Editor information

Editors and Affiliations

University of Michigan, Ann Arbor, MI, USA
Danai Koutra
University of Vienna, Vienna, Austria
Claudia Plant
Max Planck Institute for Software Systems, Kaiserslautern, Germany
Manuel Gomez Rodriguez
Politecnico di Torino, Turin, Italy
Elena Baralis
CENTAI, Turin, Italy
Francesco Bonchi

Ethics declarations

Ethical Statement

Freedom of Speech and Censorship. Our research aims to develop algorithms that can effectively identify and mitigate harmful language across multiple platforms. We recognize the importance of protecting individuals from the adverse effects of hate speech and the need to balance this with upholding free speech. Content moderation is one application where our method could help censor hate speech on social media platforms such as Twitter, Facebook, Reddit, etc. However, one ethical concern is our system’s false positives, i.e., if the system incorrectly flags a user’s text as hate speech, it may censor legitimate free speech. Therefore, we discourage incorporating our methodology in a purely automated manner for any real-world content moderation system until and unless a human annotator works alongside the system to determine the final decision. Use of Hate Speech Datasets. In our work, we incorporated publicly available well-established datasets. We have correctly cited the corresponding dataset papers and followed the necessary steps in utilizing those datasets in our work. We understand that the hate speech examples used in the paper are potentially harmful content that could be used for malicious activities. However, our work aims to help better investigate and help mitigate the harms of online hate. Therefore, we have assessed that the benefits of using these real-world examples to explain our work better outweigh the potential risks. Fairness and Bias in Detection. Our work values the principles of fairness and impartiality. To reduce biases and ethical problems, we openly disclose our methodology, results, and limitations and will continue to assess and improve our system in the future.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sheth, P., Kumarage, T., Moraffah, R., Chadha, A., Liu, H. (2023). PEACE: Cross-Platform Hate Speech Detection - A Causality-Guided Framework. In: Koutra, D., Plant, C., Gomez Rodriguez, M., Baralis, E., Bonchi, F. (eds) Machine Learning and Knowledge Discovery in Databases: Research Track. ECML PKDD 2023. Lecture Notes in Computer Science(), vol 14169. Springer, Cham. https://doi.org/10.1007/978-3-031-43412-9_33

Download citation

DOI: https://doi.org/10.1007/978-3-031-43412-9_33
Published: 17 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43411-2
Online ISBN: 978-3-031-43412-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)