Skip to main content

Explainable Abusive Language Classification Leveraging User and Network Data

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12979))

Abstract

Online hate speech is a phenomenon with considerable consequences for our society. Its automatic detection using machine learning is a promising approach to contain its spread. However, classifying abusive language with a model that purely relies on text data is limited in performance due to the complexity and diversity of speech (e.g., irony, sarcasm). Moreover, studies have shown that a significant amount of hate on social media platforms stems from online hate communities. Therefore, we develop an abusive language detection model leveraging user and network data to improve the classification performance. We integrate the explainable AI framework SHAP (SHapley Additive exPlanations) to alleviate the general issue of missing transparency associated with deep learning models, allowing us to assess the model’s vulnerability toward bias and systematic discrimination reliably. Furthermore, we evaluate our multimodel architecture on three datasets in two languages (i.e., English and German). Our results show that user-specific timeline and network data can improve the classification, while the additional explanations resulting from SHAP make the predictions of the model interpretable to humans.

Warning: This paper contains content that may be abusive or offensive.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Code available on https://github.com/mawic/multimodal-abusive-language-detection.

  2. 2.

    https://huggingface.co/transformers/.

  3. 3.

    If a user is mentioned in a tweet, an “@” symbol appears before the user name.

  4. 4.

    Network data is not avaiable for all users.

References

  1. Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech: Theory Exp. 2008(10), P10008 (2008)

    Article  Google Scholar 

  2. Campbell, W., Baseman, E., Greenfield, K.: Content + context networks for user classification in twitter. In: Frontiers of Network Analysis, NIPS Workshop, 9 December 2013 (2013)

    Google Scholar 

  3. Chatzakou, D., Kourtellis, N., Blackburn, J., De Cristofaro, E., Stringhini, G., Vakali, A.: Mean birds: Detecting aggression and bullying on twitter. In: WebSci, pp. 13–22 (2017)

    Google Scholar 

  4. Davidson, T., Warmsley, D., Macy, M., Weber, I.: Automated hate speech detection and the problem of offensive language. In: Proceedings of 11th ICWSM Conference (2017)

    Google Scholar 

  5. Fehn Unsvåg, E., Gambäck, B.: The effects of user features on Twitter hate speech detection. In: Proceedings of 2nd Workshop on Abusive Language Online (ALW2), pp. 75–85. ACL (2018)

    Google Scholar 

  6. Founta, A.M., Chatzakou, D., Kourtellis, N., Blackburn, J., Vakali, A., Leontiadis, I.: A unified deep learning architecture for abuse detection. In: WebSci, pp. 105–114. ACM (2019)

    Google Scholar 

  7. Friedman, B., Nissenbaum, H.: Bias in computer systems. ACM Trans. Inf. Syst. 14(3), 330–347 (1996)

    Article  Google Scholar 

  8. Garland, J., Ghazi-Zahedi, K., Young, J.G., Hébert-Dufresne, L., Galesic, M.: Countering hate on social media: large scale classification of hate and counter speech. In: Proceedings of 4th Workshop on Online Abuse and Harms, pp. 102–112 (2020)

    Google Scholar 

  9. Hamilton, W.L., Ying, Z., Leskovec, J.: Inductive representation learning on large graphs. In: NIPS, pp. 1024–1034 (2017)

    Google Scholar 

  10. Hennig, M., Brandes, U., Pfeffer, J., Mergel, I.: Studying Social Networks. A Guide to Empirical Research, Campus Verlag, New York (2012)

    Google Scholar 

  11. Kreißel, P., Ebner, J., Urban, A., Guhl, J.: Hass auf Knopfdruck. Rechtsextreme Trollfabriken und das Ökosystem koordinierter Hasskampagnen im Netz, Institute for Strategic Dialogue (2018)

    Google Scholar 

  12. Li, S., Zaidi, N.A., Liu, Q., Li, G.: Neighbours and kinsmen: hateful users detection with graph neural network. In: Karlapalem, K., Cheng, H., Ramakrishnan, N., Agrawal, R.K., Reddy, P.K., Srivastava, J., Chakraborty, T. (eds.) PAKDD 2021. LNCS (LNAI), vol. 12712, pp. 434–446. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-75762-5_35

    Chapter  Google Scholar 

  13. Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: NeurIPS (2017)

    Google Scholar 

  14. Mathew, B., Saha, P., Yimam, S.M., Biemann, C., Goyal, P., Mukherjee, A.: Hatexplain: A benchmark dataset for explainable hate speech detection. arXiv preprint arXiv:2012.10289 (2020)

  15. Mishra, P., Del Tredici, M., Yannakoudakis, H., Shutova, E.: Author profiling for abuse detection. In: COLING, pp. 1088–1098. ACL (2018)

    Google Scholar 

  16. Mishra, P., Yannakoudakis, H., Shutova, E.: Tackling online abuse: A survey of automated abuse detection methods. arXiv preprint arXiv:1908.06024 (2019)

  17. Molnar, C.: Interpretable Machine Learning (2019). https://christophm.github.io/interpretable-ml-book/

  18. Mosca, E., Wich, M., Groh, G.: Understanding and interpreting the impact of user context in hate speech detection. In: Proceedings of 9th International Workshop on Natural Language Processing for Social Media, pp. 91–102. ACL (2021)

    Google Scholar 

  19. Papegnies, E., Labatut, V., Dufour, R., Linarès, G.: Graph-based features for automatic online abuse detection. In: Camelin, N., Estève, Y., Martín-Vide, C. (eds.) SLSP 2017. LNCS (LNAI), vol. 10583, pp. 70–81. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68456-7_6

    Chapter  Google Scholar 

  20. Qian, J., ElSherief, M., Belding, E., Wang, W.Y.: Leveraging intra-user and inter-user representation learning for automated hate speech detection. In: NAACL 2018 (Short Papers), pp. 118–123. ACL (2018)

    Google Scholar 

  21. Raisi, E., Huang, B.: Cyberbullying detection with weakly supervised machine learning, ASONAM 2017, pp. 409–416. Association for Computing Machinery, New York (2017)

    Google Scholar 

  22. Ribeiro, M., Calais, P., Santos, Y., Almeida, V., Meira Jr, W.: Characterizing and detecting hateful users on twitter. In: Proceedings of International AAAI Conference on Web and Social Media, vol. 12 (2018)

    Google Scholar 

  23. Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. In: 2019 5th Workshop on Energy Efficient Machine Learning and Cognitive Computing - NeurIPS 2019 (2019)

    Google Scholar 

  24. Schmidt, A., Wiegand, M.: A survey on hate speech detection using natural language processing. In: Proceedings 5th International Workshop on Natural Language Processing for Social Media, pp. 1–10. ACL (2017)

    Google Scholar 

  25. Shapley, L.: Quota solutions of n-person games. Contrib. Theor. Games 2, 343–359 (1953)

    MathSciNet  MATH  Google Scholar 

  26. Štrumbelj, E., Kononenko, I.: Explaining prediction models and individual predictions with feature contributions. Knowl. and Inf. Syst. 41(3), 647–665 (2013). https://doi.org/10.1007/s10115-013-0679-x

    Article  Google Scholar 

  27. Švec, A., Pikuliak, M., Šimko, M., Bieliková, M.: Improving moderation of online discussions via interpretable neural models. In: Proceedings of 2nd Workshop on Abusive Language Online (ALW2), pp. 60–65. ACL (2018)

    Google Scholar 

  28. Vidgen, B., et al.: Detecting East Asian prejudice on social media. In: Proceedings of 4th Workshop on Online Abuse and Harms, pp. 162–172. ACL (2020)

    Google Scholar 

  29. Vidgen, B., Harris, A., Nguyen, D., Tromble, R., Hale, S., Margetts, H.: Challenges and frontiers in abusive content detection. In: Proceedings of 3rd Workshop on Abusive Language Online, pp. 80–93. ACL (2019)

    Google Scholar 

  30. Vijayaraghavan, P., Larochelle, H., Roy, D.: Interpretable multi-modal hate speech detection. In: Proceedings of International Conference on Machine Learning AI for Social Good Workshop (2019)

    Google Scholar 

  31. Wang, C.: Interpreting neural network hate speech classifiers. In: Proceedings of 2nd Workshop on Abusive Language Online (ALW2), pp. 86–92. ACL (2018)

    Google Scholar 

  32. Waseem, Z., Davidson, T., Warmsley, D., Weber, I.: Understanding abuse: A typology of abusive language detection subtasks. In: Proceedings of 1st Workshop on Abusive Language Online, pp. 78–84. ACL (2017)

    Google Scholar 

  33. Waseem, Z., Hovy, D.: Hateful symbols or hateful people? predictive features for hate speech detection on Twitter. In: Proceedings of NAACL Student Research Workshop, pp. 88–93. ACL (2016)

    Google Scholar 

  34. Wich, M., Bauer, J., Groh, G.: Impact of politically biased data on hate speech classification. In: Proceedings of 4th Workshop on Online Abuse and Harms, pp. 54–64. ACL (2020)

    Google Scholar 

  35. Wich, M., Breitinger, M., Strathern, W., Naimarevic, M., Groh, G., Pfeffer, J.: Are your friends also haters? identification of hater networks on social media: data paper. In: Companion Proceedings of Web Conference 2021, ACM (2021)

    Google Scholar 

  36. Williams, M.L., Burnap, P., Javed, A., Liu, H., Ozalp, S.: Hate in the machine: Anti-black and anti-muslim social media posts as predictors of offline racially and religiously aggravated crime. Br. J. Criminol. 60(1), 93–117 (2020)

    Article  Google Scholar 

  37. Wolf, T., et al.: Transformers: State-of-the-art natural language processing. In: Proceedings of 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 38–45. ACL (2020)

    Google Scholar 

  38. Xu, J., Lu, T.C., et al.: Automated classification of extremist twitter accounts using content-based and network-based features. In: 2016 IEEE International Conference on Big Data, pp. 2545–2549. IEEE (2016)

    Google Scholar 

Download references

Acknowledgments

We would like to thank Anika Apel and Mariam Khuchua for their contribution to this project. The research has been partially funded by a scholarship from the Hanns Seidel Foundation financed by the German Federal Ministry of Education and Research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maximilian Wich .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wich, M., Mosca, E., Gorniak, A., Hingerl, J., Groh, G. (2021). Explainable Abusive Language Classification Leveraging User and Network Data. In: Dong, Y., Kourtellis, N., Hammer, B., Lozano, J.A. (eds) Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track. ECML PKDD 2021. Lecture Notes in Computer Science(), vol 12979. Springer, Cham. https://doi.org/10.1007/978-3-030-86517-7_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86517-7_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86516-0

  • Online ISBN: 978-3-030-86517-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics