Skip to main content

MemeGraphs: Linking Memes to Knowledge Graphs

  • Conference paper
  • First Online:
Document Analysis and Recognition - ICDAR 2023 (ICDAR 2023)

Abstract

Memes are a popular form of communicating trends and ideas in social media and on the internet in general, combining the modalities of images and text. They can express humor and sarcasm but can also have offensive content. Analyzing and classifying memes automatically is challenging since their interpretation relies on the understanding of visual elements, language, and background knowledge. Thus, it is important to meaningfully represent these sources and the interaction between them in order to classify a meme as a whole. In this work, we propose to use scene graphs, that express images in terms of objects and their visual relations, and knowledge graphs as structured representations for meme classification with a Transformer-based architecture. We compare our approach with ImgBERT, a multimodal model that uses only learned (instead of structured) representations of the meme, and observe consistent improvements. We further provide a dataset with human graph annotations that we compare to automatically generated graphs and entity linking. Analysis shows that automatic methods link more entities than human annotators and that automatically generated graphs are better suited for hatefulness classification in memes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Disclaimer: This paper contains examples of hateful content.

  2. 2.

    The code and data for the MemeGraphs method are available on:

    https://github.com/vasilikikou/memegraphs.

  3. 3.

    https://www.drivendata.org/competitions/64/hateful-memes/.

  4. 4.

    The method was called Concat BERT in [13] and ImgBERT in [14]. Here we call it ImgBERT because we use their implementation.

  5. 5.

    https://www.kaggle.com/datasets/SIZZLE/2016electionmemes.

  6. 6.

    This number was chosen after observation of the scene graphs resulting from the memes in order to avoid having an overcrowded scene graph with multiple objects.

  7. 7.

    https://spacy.io/universe/project/spacy-transformers.

  8. 8.

    https://www.wikidata.org/wiki/Wikidata:Main_Page.

  9. 9.

    All the texts are concatenated with a full stop.

  10. 10.

    https://huggingface.co/docs/Transformers/model_doc/bert#Transformers.BertForSequenceClassification.

References

  1. Aggarwal, P., Liman, M.E., Gold, D., Zesch, T.: VL-BERT+: detecting protected groups in hateful multimodal memes. In: Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021), pp. 207–214. Online, August 2021. https://doi.org/10.18653/v1/2021.woah-1.22

  2. Henrique Luz de Araujo, P., Roth, B.: Checking HateCheck: a cross-functional analysis of behaviour-aware learning for hate speech detection. In: Proceedings of NLP Power! The First Workshop on Efficient Benchmarking in NLP, pp. 75–83. Dublin, Ireland (2022)

    Google Scholar 

  3. Behera, P., Mamta, Ekbal, A.: Only text? Only image? Or both? Predicting sentiment of internet memes. In: Proceedings of the 17th International Conference on Natural Language Processing (ICON), pp. 444–452. Indian Institute of Technology Patna, Patna, India (2020). https://aclanthology.org/2020.icon-main.60

  4. Blaier, E., Malkiel, I., Wolf, L.: Caption enriched samples for improving hateful memes detection. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 9350–9358. Online and Punta Cana, Dominican Republic (2021). https://doi.org/10.18653/v1/2021.emnlp-main.738

  5. Chen, S., Aguilar, G., Neves, L., Solorio, T.: Can images help recognize entities? A study of the role of images for multimodal NER. In: Proceedings of the 2021 EMNLP Workshop W-NUT: The Seventh Workshop on Noisy User-Generated Text, pp. 87–96. Online and Punta Cana, Dominican Republic (2021)

    Google Scholar 

  6. Chen, Y.-C., et al.: UNITER: UNiversal image-TExt representation learning. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12375, pp. 104–120. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58577-8_7

    Chapter  Google Scholar 

  7. Das, A., Wahi, J.S., Li, S.: Detecting hate speech in multi-modal memes. arXiv preprint: arXiv:2012.14891 (2020)

  8. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. Miami Beach, FL, USA (2009)

    Google Scholar 

  9. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186. Minneapolis, Minnesota, USA (2019)

    Google Scholar 

  10. Dimitrov, D., et al.: Detecting propaganda techniques in memes. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 6603–6617. Online (2021). https://doi.org/10.18653/v1/2021.acl-long.516

  11. Fersini, E., et al.: SemEval-2022 task 5: Multimedia automatic misogyny identification. In: Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022), pp. 533–549. Seattle, United States (2022)

    Google Scholar 

  12. Gan, Z., Chen, Y.C., Li, L., Zhu, C., Cheng, Y., Liu, J.: Large-scale adversarial training for vision-and-language representation learning. arXiv:2006.06195 (2020)

  13. Kiela, D., et al.: The hateful memes challenge: detecting hate speech in multimodal memes. Adv. Neural. Inf. Process. Syst. 33, 2611–2624 (2020)

    Google Scholar 

  14. Kougia, V., Pavlopoulos, J.: Multimodal or text? Retrieval or BERT? Benchmarking classifiers for the shared task on hateful memes. In: Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021), pp. 220–225. Online (2021). https://doi.org/10.18653/v1/2021.woah-1.24

  15. Krishna, R., et al.: Visual genome: connecting language and vision using crowdsourced dense image annotations. Int. J. Comput. Vision 123(1), 32–73 (2017)

    Article  MathSciNet  Google Scholar 

  16. Lee, R.K.W., Cao, R., Fan, Z., Jiang, J., Chong, W.H.: Disentangling hate in online memes. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 5138–5147 (2021)

    Google Scholar 

  17. Li, J., Ataman, D., Sennrich, R.: Vision matters when it should: sanity checking multimodal machine translation models. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 8556–8562. Online and Punta Cana, Dominican Republic (2021)

    Google Scholar 

  18. Li, L.H., Yatskar, M., Yin, D., Hsieh, C.J., Chang, K.W.: VisualBERT: a simple and performant baseline for vision and language. arXiv:1908.03557 (2019)

  19. Li, X., et al.: Oscar: object-semantics aligned pre-training for vision-language tasks. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12375, pp. 121–137. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58577-8_8

    Chapter  Google Scholar 

  20. Lu, J., Batra, D., Parikh, D., Lee, S.: VilBERT: pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. arXiv:1908.02265 (2019)

  21. Mathias, L., et al.: Findings of the WOAH 5 shared task on fine grained hateful memes detection. In: Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021), pp. 201–206. Online (2021). https://doi.org/10.18653/v1/2021.woah-1.21

  22. Mozes, M., Schmitt, M., Golkov, V., Schütze, H., Cremers, D.: Scene graph generation for better image captioning? arXiv:2109.11398 (2021)

  23. Pramanick, S., Sharma, S., Dimitrov, D., Akhtar, M.S., Nakov, P., Chakraborty, T.: MOMENTA: a multimodal framework for detecting harmful memes and their targets. In: Findings of the Association for Computational Linguistics: EMNLP 2021, pp. 4439–4455. Punta Cana, Dominican Republic (2021). https://doi.org/10.18653/v1/2021.findings-emnlp.379

  24. Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763 (2021)

    Google Scholar 

  25. Sharifzadeh, S., Baharlou, S.M., Schmitt, M., Schütze, H., Tresp, V.: Improving scene graph classification by exploiting knowledge from texts. Proc. AAAI Conf. Artif. Intell. 36(2), 2189–2197 (2022)

    Google Scholar 

  26. Sharifzadeh, S., Baharlou, S.M., Tresp, V.: Classification by attention: scene graph classification with prior knowledge. In: Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21), pp. 5025–5033. Online (2021)

    Google Scholar 

  27. Su, W., et al.: Vl-BERT: pre-training of generic visual-linguistic representations. arXiv:1908.08530 (2019)

  28. Suryawanshi, S., Chakravarthi, B.R., Arcan, M., Buitelaar, P.: Multimodal meme dataset (MultiOFF) for identifying offensive content in image and text. In: Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying, pp. 32–41. Marseille, France (2020). https://aclanthology.org/2020.trac-1.6

  29. Yang, J., Lu, J., Lee, S., Batra, D., Parikh, D.: Graph R-CNN for scene graph generation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 690–706. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_41

    Chapter  Google Scholar 

  30. Yang, X., Tang, K., Zhang, H., Cai, J.: Auto-encoding scene graphs for image captioning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10685–10694. Long Beach, CA, USA (2019)

    Google Scholar 

  31. Yin, Y., Meng, F., Su, J., Zhou, C., Yang, Z., Zhou, J., Luo, J.: A novel graph-based multi-modal fusion encoder for neural machine translation. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 3025–3035. Online and Punta Cana, Dominican Republic (2020)

    Google Scholar 

  32. Zellers, R., Yatskar, M., Thomson, S., Choi, Y.: Neural motifs: scene graph parsing with global context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5831–5840 (2018)

    Google Scholar 

  33. Zhu, R.: Enhance multimodal transformer with external label and in-domain pretrain: hateful meme challenge winning solution. arXiv:2012.08290 (2020)

Download references

Acknowledgements

This research was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) - RO 5127/2-1 and the Vienna Science and Technology Fund (WWTF)[10.47379/VRG19008]. We thank Christos Bintsis for participating in the manual augmentation. We also thank Matthias Aßenmacher and the anonymous reviewers for their valuable feedback.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vasiliki Kougia .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kougia, V. et al. (2023). MemeGraphs: Linking Memes to Knowledge Graphs. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds) Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14187. Springer, Cham. https://doi.org/10.1007/978-3-031-41676-7_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-41676-7_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-41675-0

  • Online ISBN: 978-3-031-41676-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics