Skip to main content
Log in

Detecting hate speech in memes: a review

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Methods that detect hate speech in memes have become vital in our connected society, especially in the context of many social media companies. Memes are a quick way to transfer ideas, events, or other content from the real world to the digital one. Massively created, they reproduce like viruses and aim to get people’s attention. They are powerful tools, that, when used to spread hate speech, are able to have global reach. Meme has a broad definition and different formats, such as short videos, GIFS, challenges, among others. In this paper, we follow the classical format of an image with superimposed text. In this context, the hateful meme detection task is extremely challenging, especially due to memes’ multimodal nature, i.e., they have two different sources: image and text. Consequently, when dealing with memes, a classification model needs to tackle both components in order to classify them as hateful or not-hateful. This work contributes to the effort to solve this task. We list the most recent research, synthesize and discuss the approaches proposed in the current literature by providing a critical analysis of these methods, highlighting their strengths and points to improve. We also introduce a taxonomy to allow grouping similar approaches. Our conclusion indicates that, despite the few studies currently available and the few public datasets specially designed for this topic, there is an evolution in the methodologies used, which is reflected in the evolution of the results attained.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Afridi TH, Alam A, Khan MN et al (2021) A multimodal memes classification: a survey and open research issues. Innovations in smart cities applications, vol 4. Springer, Berlin, pp 1451–1466. https://doi.org/10.1007/978-3-030-66840-2

    Chapter  Google Scholar 

  • Atrey PK, Hossain MA, Saddik AE et al (2010) Multimodal fusion for multimedia analysis: a survey. Multimedia systems 16(6):345–379. https://doi.org/10.1007/s00530-010-0182-0

    Article  Google Scholar 

  • Baltrusaitis T, Ahuja C, Morency LP (2019) Multimodal machine learning: a survey and taxonomy. IEEE transactions on pattern analysis and machine intelligence 41(2):423–443. https://doi.org/10.1109/tpami.2018.2798607

    Article  Google Scholar 

  • Chen T, Xu R, He Y et al (2017) Improving sentiment analysis via sentence type classification using BiLSTM-CRF and CNN. Expert systems with applications 72:221–230. https://doi.org/10.1016/j.eswa.2016.10.065

    Article  Google Scholar 

  • Chen YC, Li L, Yu L et al (2020) UNITER: universal image-TExt representation learning. Computer vision - ECCV 2020. Springer International Publishing, Berlin, pp 104–120. https://doi.org/10.1007/978-3-030-58577-8_7

    Chapter  Google Scholar 

  • Conneau A, Baevski A, Collobert R et al (2021) Unsupervised cross-lingual representation learning for speech recognition. Interspeech 2021. ISCA, Dublin. https://doi.org/10.21437/interspeech.2021-329

    Book  Google Scholar 

  • Das A, Wahi JS, Li S (2020) Detecting hate speech in multi-modal memes. arXiv preprint arXiv:2012.14891

  • Devlin J, Chang MW, Lee K, et al (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805

  • Facebook (2021) Facebook hate speech policy. https://www.ilga-europe.org/what-we-do/our-advocacy-work/hate-crime-hate-speech, Accessed 2021 March 14

  • Fersini E, Gasparini F, Corchs S (2019) Detecting sexist MEME on the web: A study on textual and visual cues. In: 2019 8th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW). IEEE, https://doi.org/10.1109/aciiw.2019.8925199

  • Gallo I, Calefati A, Nawaz S, (2018) Image and encoded text fusion for multi-modal classification. In, et al (2018) Digital Image Computing: Techniques and Applications (DICTA). IEEE. https://doi.org/10.1109/dicta.2018.8615789

  • Gomez R, Gibert J, Gomez L, et al (2020) Exploring hate speech detection in multimodal publications. In: 2020 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, https://doi.org/10.1109/wacv45572.2020.9093414

  • Goswami S (2021) Reddit memes dataset. Recovered by https://www.kagglecom/sayangoswami/reddit-memes-dataset

  • Hassaballah M, Abdelmgeid AA, Alshazly HA (2016) Image features detection, description and matching. Image feature detectors and descriptors. Springer International Publishing, Berlin, pp 11–45. https://doi.org/10.1007/978-3-319-28854-3_2

    Book  Google Scholar 

  • Hootsuite W (2021) Global digital report 2021. Recovered by https://digitalreport wearesocial com

  • ILGA-Europe (2020) Hate crime & hate speech. https://www.facebook.com/help/212722115425932, Accessed 2021 March 16

  • Karkkainen K, Joo J (2021) FairFace: Face attribute dataset for balanced race, gender, and age for bias measurement and mitigation. In: 2021 IEEE winter conference on applications of computer vision (WACV). IEEE, https://doi.org/10.1109/wacv48630.2021.00159

  • Kiela D, Firooz H, Mohan A, et al (2020) The hateful memes challenge: Detecting hate speech in multimodal memes. arXiv preprint arXiv:2005.04790

  • Li LH, Yatskar M, Yin D, et al (2019) Visualbert: A simple and performant baseline for vision and language. arXiv preprint arXiv:1908.03557

  • Li X, Yin X, Li C et al (2020) Oscar: object-semantics aligned pre-training for vision-language tasks computer vision - ECCV 2020. Springer International Publishing, Berlin, pp 121–137. https://doi.org/10.1007/978-3-030-58577-8_8

    Book  Google Scholar 

  • Lin TY, Maire M, Belongie S et al (2014) Microsoft COCO: common objects in context. Computer vsion - ECCV 2014. Springer International Publishing, Berlin, pp 740–755. https://doi.org/10.1007/978-3-319-10602-1_48

    Book  Google Scholar 

  • Lippe P, Holla N, Chandra S, et al (2020) A multimodal framework for the detection of hateful memes. arXiv preprint arXiv:2012.12871

  • Liu Y, Ott M, Goyal N, et al (2019) Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692

  • Lu J, Batra D, Parikh D, et al (2019) Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. In: Advances in neural information processing systems, pp 13–23. https://doi.org/10.21437/Interspeech.2008-

  • McCallum A, Nigam K, et al (1998) A comparison of event models for naive bayes text classification. In: AAAI-98 workshop on learning for text categorization, Citeseer, pp 41–48

  • Muennighoff N (2020) Vilio: State-of-the-art visio-linguistic models applied to hateful memes. arXiv preprint arXiv:2012.07788

  • Nobata C, Tetreault J, Thomas A, et al (2016) Abusive language detection in online user content. In: Proceedings of the 25th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, https://doi.org/10.1145/2872427.2883062

  • Peng CYJ, Lee KL, Ingersoll GM (2002) An introduction to logistic regression analysis and reporting. J Educ Res 96(1):3–14. https://doi.org/10.1080/00220670209598786

    Article  Google Scholar 

  • Pham V, Pham C, Dang T (2020) Road damage detection and classification with detectron2 and faster r-CNN. In: 2020 IEEE International Conference on Big Data (Big Data). IEEE, https://doi.org/10.1109/bigdata50022.2020.9378027

  • Pires T, Schlinger E, Garrette D (2019) How multilingual is multilingual BERT? In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, https://doi.org/10.18653/v1/p19-1493

  • Quintel T, Ullrich C (2020) Self-regulation of fundamental rights? The eu code of conduct on hate speech, related initiatives and beyond. Fundamental rights protection online. Edward Elgar Publishing, Cheltenham

    Google Scholar 

  • Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: 2017 IEEE conference on computer vsion and pattern recognition (CVPR). IEEE, https://doi.org/10.1109/cvpr.2017.690

  • Ren S, He K, Girshick R et al (2017) Faster r-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/tpami.2016.2577031

    Article  Google Scholar 

  • Sabat BO, Ferrer CC, Giro-i Nieto X (2019) Hate speech in pixels: Detection of offensive memes towards automatic moderation. arXiv preprint arXiv:1910.02334

  • Sandulescu V (2020) Detecting hateful memes using a multimodal deep ensemble. arXiv preprint arXiv:2012.13235

  • Sethy A, Ramabhadran B (2008) Bag-of-word normalized n-gram models. In: Interspeech 2008. ISCA, https://doi.org/10.21437/interspeech.2008-265

  • Sharma P, Ding N, Goodman S, et al (2018) Conceptual captions: A cleaned, hypernymed, image alt-text dataset for automatic image captioning. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, https://doi.org/10.18653/v1/p18-1238

  • Sharma C, Bhageria D, Scott W, et al (2020) Semeval-2020 task 8: Memotion analysis–the visuo-lingual metaphor! arXiv preprint arXiv:2008.03781

  • Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  • Srinivasa-Desikan B (2018) Natural language processing and computational linguistics: a practical guide to text analysis with Python, Gensim, spaCy, and Keras. Packt Publishing Ltd

  • Suryawanshi S, Chakravarthi BR, Arcan M, et al (2020) Multimodal meme dataset (multioff) for identifying offensive content in image and text. In: Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying, pp 32–41

  • Tan H, Bansal M (2019) LXMERT: Learning cross-modality encoder representations from transformers. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, https://doi.org/10.18653/v1/d19-1514

  • Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning, PMLR, pp 6105–6114

  • Twitter (2021) Coordinated harmful activity. https://help.twitter.com/en/rules-and-policies/coordinated-harmful-activity, Accessed 2021 April 26

  • Velioglu R, Rose J (2020) Detecting hate speech in memes using multimodal deep learning approaches: Prize-winning solution to hateful memes challenge. arXiv preprint arXiv:2012.12975

  • Vinyals O, Toshev A, Bengio S et al (2017) Show and tell: lessons learned from the 2015 MSCOCO image captioning challenge. IEEE Trans Pattern Anal Mach Intell 39(4):652–663. https://doi.org/10.1109/tpami.2016.2587640

    Article  Google Scholar 

  • Vlad GA, Zaharia GE, Cercel DC et al (2020) UPB @ DANKMEMES: Italian memes analysis - employing visual models and graph convolutional networks for meme identification and hate speech detection. EVALITA evaluation of NLP and speech tools for Italian - December 17th, 2020. Accademia University Press, Maidenhead, pp 288–293. https://doi.org/10.4000/books.aaccademia.7360

    Book  Google Scholar 

  • Yao L, Mao C, Luo Y (2019) Graph convolutional networks for text classification. Proc AAAI Conf Artif Intell 33:7370–7377. https://doi.org/10.1609/aaai.v33i01.33017370

    Article  Google Scholar 

  • YouTube (2021) Hate speech policy. https://support.google.com/youtube/answer/2801939?hl=en, Accessed 2021 April 26

  • Yu F, Tang J, Yin W, et al (2020) Ernie-vil: Knowledge enhanced vision-language representations through scene graph. arXiv preprint arXiv:2006.16934 1:12

  • Zhang Y, Liu Q, Song L (2018) Sentence-state LSTM for text representation. https://doi.org/10.18653/v1/p18-1030

  • Zhang W, Liu G, Li Z, et al (2020) Hateful memes detection via complementary visual and linguistic networks. arXiv preprint arXiv:2012.04977

  • Zhou Y, Chen Z, Yang H (2021) Multimodal learning for hateful memes detection. In: 2021 IEEE International Conference on Multimedia & Expo Workshops (ICMEW). IEEE, https://doi.org/10.1109/icmew53276.2021.9455994

  • Zhu R (2020) Enhance multimodal transformer with external label and in-domain pretrain: Hateful meme challenge winning solution. arXiv preprint arXiv:2012.08290

Download references

Funding

The authors did not receive support from any organization for the submitted work. No funding was received to assist with the preparation of this manuscript. No funding was received for conducting this study. No funds, grants, or other support was received.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paulo Cezar de Q. Hermida.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest. The authors have no relevant financial or non-financial interests to disclose. The authors have no competing interests to declare that are relevant to the content of this article. All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript. The authors have no financial or proprietary interests in any material discussed in this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hermida, P.C.d.Q., Santos, E.M.d. Detecting hate speech in memes: a review. Artif Intell Rev 56, 12833–12851 (2023). https://doi.org/10.1007/s10462-023-10459-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-023-10459-7

Keywords

Navigation