Explainable Abusive Language Classification Leveraging User and Network Data

Wich, Maximilian; Mosca, Edoardo; Gorniak, Adrian; Hingerl, Johannes; Groh, Georg

doi:10.1007/978-3-030-86517-7_30

Explainable Abusive Language Classification Leveraging User and Network Data

Conference paper
First Online: 10 September 2021

1271 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12979))

Abstract

Online hate speech is a phenomenon with considerable consequences for our society. Its automatic detection using machine learning is a promising approach to contain its spread. However, classifying abusive language with a model that purely relies on text data is limited in performance due to the complexity and diversity of speech (e.g., irony, sarcasm). Moreover, studies have shown that a significant amount of hate on social media platforms stems from online hate communities. Therefore, we develop an abusive language detection model leveraging user and network data to improve the classification performance. We integrate the explainable AI framework SHAP (SHapley Additive exPlanations) to alleviate the general issue of missing transparency associated with deep learning models, allowing us to assess the model’s vulnerability toward bias and systematic discrimination reliably. Furthermore, we evaluate our multimodel architecture on three datasets in two languages (i.e., English and German). Our results show that user-specific timeline and network data can improve the classification, while the additional explanations resulting from SHAP make the predictions of the model interpretable to humans.

Warning: This paper contains content that may be abusive or offensive.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Code available on https://github.com/mawic/multimodal-abusive-language-detection.
2.
https://huggingface.co/transformers/.
3.
If a user is mentioned in a tweet, an “@” symbol appears before the user name.
4.
Network data is not avaiable for all users.

References

Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech: Theory Exp. 2008(10), P10008 (2008)
Article Google Scholar
Campbell, W., Baseman, E., Greenfield, K.: Content + context networks for user classification in twitter. In: Frontiers of Network Analysis, NIPS Workshop, 9 December 2013 (2013)
Google Scholar
Chatzakou, D., Kourtellis, N., Blackburn, J., De Cristofaro, E., Stringhini, G., Vakali, A.: Mean birds: Detecting aggression and bullying on twitter. In: WebSci, pp. 13–22 (2017)
Google Scholar
Davidson, T., Warmsley, D., Macy, M., Weber, I.: Automated hate speech detection and the problem of offensive language. In: Proceedings of 11th ICWSM Conference (2017)
Google Scholar
Fehn Unsvåg, E., Gambäck, B.: The effects of user features on Twitter hate speech detection. In: Proceedings of 2nd Workshop on Abusive Language Online (ALW2), pp. 75–85. ACL (2018)
Google Scholar
Founta, A.M., Chatzakou, D., Kourtellis, N., Blackburn, J., Vakali, A., Leontiadis, I.: A unified deep learning architecture for abuse detection. In: WebSci, pp. 105–114. ACM (2019)
Google Scholar
Friedman, B., Nissenbaum, H.: Bias in computer systems. ACM Trans. Inf. Syst. 14(3), 330–347 (1996)
Article Google Scholar
Garland, J., Ghazi-Zahedi, K., Young, J.G., Hébert-Dufresne, L., Galesic, M.: Countering hate on social media: large scale classification of hate and counter speech. In: Proceedings of 4th Workshop on Online Abuse and Harms, pp. 102–112 (2020)
Google Scholar
Hamilton, W.L., Ying, Z., Leskovec, J.: Inductive representation learning on large graphs. In: NIPS, pp. 1024–1034 (2017)
Google Scholar
Hennig, M., Brandes, U., Pfeffer, J., Mergel, I.: Studying Social Networks. A Guide to Empirical Research, Campus Verlag, New York (2012)
Google Scholar
Kreißel, P., Ebner, J., Urban, A., Guhl, J.: Hass auf Knopfdruck. Rechtsextreme Trollfabriken und das Ökosystem koordinierter Hasskampagnen im Netz, Institute for Strategic Dialogue (2018)
Google Scholar
Li, S., Zaidi, N.A., Liu, Q., Li, G.: Neighbours and kinsmen: hateful users detection with graph neural network. In: Karlapalem, K., Cheng, H., Ramakrishnan, N., Agrawal, R.K., Reddy, P.K., Srivastava, J., Chakraborty, T. (eds.) PAKDD 2021. LNCS (LNAI), vol. 12712, pp. 434–446. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-75762-5_35
Chapter Google Scholar
Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: NeurIPS (2017)
Google Scholar
Mathew, B., Saha, P., Yimam, S.M., Biemann, C., Goyal, P., Mukherjee, A.: Hatexplain: A benchmark dataset for explainable hate speech detection. arXiv preprint arXiv:2012.10289 (2020)
Mishra, P., Del Tredici, M., Yannakoudakis, H., Shutova, E.: Author profiling for abuse detection. In: COLING, pp. 1088–1098. ACL (2018)
Google Scholar
Mishra, P., Yannakoudakis, H., Shutova, E.: Tackling online abuse: A survey of automated abuse detection methods. arXiv preprint arXiv:1908.06024 (2019)
Molnar, C.: Interpretable Machine Learning (2019). https://christophm.github.io/interpretable-ml-book/
Mosca, E., Wich, M., Groh, G.: Understanding and interpreting the impact of user context in hate speech detection. In: Proceedings of 9th International Workshop on Natural Language Processing for Social Media, pp. 91–102. ACL (2021)
Google Scholar
Papegnies, E., Labatut, V., Dufour, R., Linarès, G.: Graph-based features for automatic online abuse detection. In: Camelin, N., Estève, Y., Martín-Vide, C. (eds.) SLSP 2017. LNCS (LNAI), vol. 10583, pp. 70–81. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68456-7_6
Chapter Google Scholar
Qian, J., ElSherief, M., Belding, E., Wang, W.Y.: Leveraging intra-user and inter-user representation learning for automated hate speech detection. In: NAACL 2018 (Short Papers), pp. 118–123. ACL (2018)
Google Scholar
Raisi, E., Huang, B.: Cyberbullying detection with weakly supervised machine learning, ASONAM 2017, pp. 409–416. Association for Computing Machinery, New York (2017)
Google Scholar
Ribeiro, M., Calais, P., Santos, Y., Almeida, V., Meira Jr, W.: Characterizing and detecting hateful users on twitter. In: Proceedings of International AAAI Conference on Web and Social Media, vol. 12 (2018)
Google Scholar
Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. In: 2019 5th Workshop on Energy Efficient Machine Learning and Cognitive Computing - NeurIPS 2019 (2019)
Google Scholar
Schmidt, A., Wiegand, M.: A survey on hate speech detection using natural language processing. In: Proceedings 5th International Workshop on Natural Language Processing for Social Media, pp. 1–10. ACL (2017)
Google Scholar
Shapley, L.: Quota solutions of n-person games. Contrib. Theor. Games 2, 343–359 (1953)
MathSciNet MATH Google Scholar
Štrumbelj, E., Kononenko, I.: Explaining prediction models and individual predictions with feature contributions. Knowl. and Inf. Syst. 41(3), 647–665 (2013). https://doi.org/10.1007/s10115-013-0679-x
Article Google Scholar
Švec, A., Pikuliak, M., Šimko, M., Bieliková, M.: Improving moderation of online discussions via interpretable neural models. In: Proceedings of 2nd Workshop on Abusive Language Online (ALW2), pp. 60–65. ACL (2018)
Google Scholar
Vidgen, B., et al.: Detecting East Asian prejudice on social media. In: Proceedings of 4th Workshop on Online Abuse and Harms, pp. 162–172. ACL (2020)
Google Scholar
Vidgen, B., Harris, A., Nguyen, D., Tromble, R., Hale, S., Margetts, H.: Challenges and frontiers in abusive content detection. In: Proceedings of 3rd Workshop on Abusive Language Online, pp. 80–93. ACL (2019)
Google Scholar
Vijayaraghavan, P., Larochelle, H., Roy, D.: Interpretable multi-modal hate speech detection. In: Proceedings of International Conference on Machine Learning AI for Social Good Workshop (2019)
Google Scholar
Wang, C.: Interpreting neural network hate speech classifiers. In: Proceedings of 2nd Workshop on Abusive Language Online (ALW2), pp. 86–92. ACL (2018)
Google Scholar
Waseem, Z., Davidson, T., Warmsley, D., Weber, I.: Understanding abuse: A typology of abusive language detection subtasks. In: Proceedings of 1st Workshop on Abusive Language Online, pp. 78–84. ACL (2017)
Google Scholar
Waseem, Z., Hovy, D.: Hateful symbols or hateful people? predictive features for hate speech detection on Twitter. In: Proceedings of NAACL Student Research Workshop, pp. 88–93. ACL (2016)
Google Scholar
Wich, M., Bauer, J., Groh, G.: Impact of politically biased data on hate speech classification. In: Proceedings of 4th Workshop on Online Abuse and Harms, pp. 54–64. ACL (2020)
Google Scholar
Wich, M., Breitinger, M., Strathern, W., Naimarevic, M., Groh, G., Pfeffer, J.: Are your friends also haters? identification of hater networks on social media: data paper. In: Companion Proceedings of Web Conference 2021, ACM (2021)
Google Scholar
Williams, M.L., Burnap, P., Javed, A., Liu, H., Ozalp, S.: Hate in the machine: Anti-black and anti-muslim social media posts as predictors of offline racially and religiously aggravated crime. Br. J. Criminol. 60(1), 93–117 (2020)
Article Google Scholar
Wolf, T., et al.: Transformers: State-of-the-art natural language processing. In: Proceedings of 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 38–45. ACL (2020)
Google Scholar
Xu, J., Lu, T.C., et al.: Automated classification of extremist twitter accounts using content-based and network-based features. In: 2016 IEEE International Conference on Big Data, pp. 2545–2549. IEEE (2016)
Google Scholar

Download references

Acknowledgments

We would like to thank Anika Apel and Mariam Khuchua for their contribution to this project. The research has been partially funded by a scholarship from the Hanns Seidel Foundation financed by the German Federal Ministry of Education and Research.

Author information

Authors and Affiliations

Technical University of Munich, Munich, Germany
Maximilian Wich, Edoardo Mosca, Adrian Gorniak, Johannes Hingerl & Georg Groh

Authors

Maximilian Wich
View author publications
You can also search for this author in PubMed Google Scholar
Edoardo Mosca
View author publications
You can also search for this author in PubMed Google Scholar
Adrian Gorniak
View author publications
You can also search for this author in PubMed Google Scholar
Johannes Hingerl
View author publications
You can also search for this author in PubMed Google Scholar
Georg Groh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maximilian Wich .

Editor information

Editors and Affiliations

Facebook AI, Seattle, WA, USA
Yuxiao Dong
Torre Telefonica, Barcelona, Spain
Nicolas Kourtellis
Bielefeld University, CITEC, Bielefeld, Germany
Barbara Hammer
Basque Center for Applied Mathematics, Bilbao, Spain
Jose A. Lozano

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wich, M., Mosca, E., Gorniak, A., Hingerl, J., Groh, G. (2021). Explainable Abusive Language Classification Leveraging User and Network Data. In: Dong, Y., Kourtellis, N., Hammer, B., Lozano, J.A. (eds) Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track. ECML PKDD 2021. Lecture Notes in Computer Science(), vol 12979. Springer, Cham. https://doi.org/10.1007/978-3-030-86517-7_30

Download citation

DOI: https://doi.org/10.1007/978-3-030-86517-7_30
Published: 10 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86516-0
Online ISBN: 978-3-030-86517-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)