MemeGraphs: Linking Memes to Knowledge Graphs

Kougia, Vasiliki; Fetzel, Simon; Kirchmair, Thomas; Çano, Erion; Baharlou, Sina Moayed; Sharifzadeh, Sahand; Roth, Benjamin

doi:10.1007/978-3-031-41676-7_31

Vasiliki Kougia¹¹,
Simon Fetzel¹¹,
Thomas Kirchmair¹¹,
Erion Çano¹¹,
Sina Moayed Baharlou¹²,
Sahand Sharifzadeh¹³ &
…
Benjamin Roth¹¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14187))

Included in the following conference series:

International Conference on Document Analysis and Recognition

1114 Accesses
1 Citations

Abstract

Memes are a popular form of communicating trends and ideas in social media and on the internet in general, combining the modalities of images and text. They can express humor and sarcasm but can also have offensive content. Analyzing and classifying memes automatically is challenging since their interpretation relies on the understanding of visual elements, language, and background knowledge. Thus, it is important to meaningfully represent these sources and the interaction between them in order to classify a meme as a whole. In this work, we propose to use scene graphs, that express images in terms of objects and their visual relations, and knowledge graphs as structured representations for meme classification with a Transformer-based architecture. We compare our approach with ImgBERT, a multimodal model that uses only learned (instead of structured) representations of the meme, and observe consistent improvements. We further provide a dataset with human graph annotations that we compare to automatically generated graphs and entity linking. Analysis shows that automatic methods link more entities than human annotators and that automatically generated graphs are better suited for hatefulness classification in memes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Disclaimer: This paper contains examples of hateful content.
2.
The code and data for the MemeGraphs method are available on:
https://github.com/vasilikikou/memegraphs.
3.
https://www.drivendata.org/competitions/64/hateful-memes/.
4.
The method was called Concat BERT in [13] and ImgBERT in [14]. Here we call it ImgBERT because we use their implementation.
5.
https://www.kaggle.com/datasets/SIZZLE/2016electionmemes.
6.
This number was chosen after observation of the scene graphs resulting from the memes in order to avoid having an overcrowded scene graph with multiple objects.
7.
https://spacy.io/universe/project/spacy-transformers.
8.
https://www.wikidata.org/wiki/Wikidata:Main_Page.
9.
All the texts are concatenated with a full stop.
10.
https://huggingface.co/docs/Transformers/model_doc/bert#Transformers.BertForSequenceClassification.

References

Aggarwal, P., Liman, M.E., Gold, D., Zesch, T.: VL-BERT+: detecting protected groups in hateful multimodal memes. In: Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021), pp. 207–214. Online, August 2021. https://doi.org/10.18653/v1/2021.woah-1.22
Henrique Luz de Araujo, P., Roth, B.: Checking HateCheck: a cross-functional analysis of behaviour-aware learning for hate speech detection. In: Proceedings of NLP Power! The First Workshop on Efficient Benchmarking in NLP, pp. 75–83. Dublin, Ireland (2022)
Google Scholar
Behera, P., Mamta, Ekbal, A.: Only text? Only image? Or both? Predicting sentiment of internet memes. In: Proceedings of the 17th International Conference on Natural Language Processing (ICON), pp. 444–452. Indian Institute of Technology Patna, Patna, India (2020). https://aclanthology.org/2020.icon-main.60
Blaier, E., Malkiel, I., Wolf, L.: Caption enriched samples for improving hateful memes detection. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 9350–9358. Online and Punta Cana, Dominican Republic (2021). https://doi.org/10.18653/v1/2021.emnlp-main.738
Chen, S., Aguilar, G., Neves, L., Solorio, T.: Can images help recognize entities? A study of the role of images for multimodal NER. In: Proceedings of the 2021 EMNLP Workshop W-NUT: The Seventh Workshop on Noisy User-Generated Text, pp. 87–96. Online and Punta Cana, Dominican Republic (2021)
Google Scholar
Chen, Y.-C., et al.: UNITER: UNiversal image-TExt representation learning. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12375, pp. 104–120. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58577-8_7
Chapter Google Scholar
Das, A., Wahi, J.S., Li, S.: Detecting hate speech in multi-modal memes. arXiv preprint: arXiv:2012.14891 (2020)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. Miami Beach, FL, USA (2009)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186. Minneapolis, Minnesota, USA (2019)
Google Scholar
Dimitrov, D., et al.: Detecting propaganda techniques in memes. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 6603–6617. Online (2021). https://doi.org/10.18653/v1/2021.acl-long.516
Fersini, E., et al.: SemEval-2022 task 5: Multimedia automatic misogyny identification. In: Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022), pp. 533–549. Seattle, United States (2022)
Google Scholar
Gan, Z., Chen, Y.C., Li, L., Zhu, C., Cheng, Y., Liu, J.: Large-scale adversarial training for vision-and-language representation learning. arXiv:2006.06195 (2020)
Kiela, D., et al.: The hateful memes challenge: detecting hate speech in multimodal memes. Adv. Neural. Inf. Process. Syst. 33, 2611–2624 (2020)
Google Scholar
Kougia, V., Pavlopoulos, J.: Multimodal or text? Retrieval or BERT? Benchmarking classifiers for the shared task on hateful memes. In: Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021), pp. 220–225. Online (2021). https://doi.org/10.18653/v1/2021.woah-1.24
Krishna, R., et al.: Visual genome: connecting language and vision using crowdsourced dense image annotations. Int. J. Comput. Vision 123(1), 32–73 (2017)
Article MathSciNet Google Scholar
Lee, R.K.W., Cao, R., Fan, Z., Jiang, J., Chong, W.H.: Disentangling hate in online memes. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 5138–5147 (2021)
Google Scholar
Li, J., Ataman, D., Sennrich, R.: Vision matters when it should: sanity checking multimodal machine translation models. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 8556–8562. Online and Punta Cana, Dominican Republic (2021)
Google Scholar
Li, L.H., Yatskar, M., Yin, D., Hsieh, C.J., Chang, K.W.: VisualBERT: a simple and performant baseline for vision and language. arXiv:1908.03557 (2019)
Li, X., et al.: Oscar: object-semantics aligned pre-training for vision-language tasks. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12375, pp. 121–137. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58577-8_8
Chapter Google Scholar
Lu, J., Batra, D., Parikh, D., Lee, S.: VilBERT: pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. arXiv:1908.02265 (2019)
Mathias, L., et al.: Findings of the WOAH 5 shared task on fine grained hateful memes detection. In: Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021), pp. 201–206. Online (2021). https://doi.org/10.18653/v1/2021.woah-1.21
Mozes, M., Schmitt, M., Golkov, V., Schütze, H., Cremers, D.: Scene graph generation for better image captioning? arXiv:2109.11398 (2021)
Pramanick, S., Sharma, S., Dimitrov, D., Akhtar, M.S., Nakov, P., Chakraborty, T.: MOMENTA: a multimodal framework for detecting harmful memes and their targets. In: Findings of the Association for Computational Linguistics: EMNLP 2021, pp. 4439–4455. Punta Cana, Dominican Republic (2021). https://doi.org/10.18653/v1/2021.findings-emnlp.379
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763 (2021)
Google Scholar
Sharifzadeh, S., Baharlou, S.M., Schmitt, M., Schütze, H., Tresp, V.: Improving scene graph classification by exploiting knowledge from texts. Proc. AAAI Conf. Artif. Intell. 36(2), 2189–2197 (2022)
Google Scholar
Sharifzadeh, S., Baharlou, S.M., Tresp, V.: Classification by attention: scene graph classification with prior knowledge. In: Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21), pp. 5025–5033. Online (2021)
Google Scholar
Su, W., et al.: Vl-BERT: pre-training of generic visual-linguistic representations. arXiv:1908.08530 (2019)
Suryawanshi, S., Chakravarthi, B.R., Arcan, M., Buitelaar, P.: Multimodal meme dataset (MultiOFF) for identifying offensive content in image and text. In: Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying, pp. 32–41. Marseille, France (2020). https://aclanthology.org/2020.trac-1.6
Yang, J., Lu, J., Lee, S., Batra, D., Parikh, D.: Graph R-CNN for scene graph generation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 690–706. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_41
Chapter Google Scholar
Yang, X., Tang, K., Zhang, H., Cai, J.: Auto-encoding scene graphs for image captioning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10685–10694. Long Beach, CA, USA (2019)
Google Scholar
Yin, Y., Meng, F., Su, J., Zhou, C., Yang, Z., Zhou, J., Luo, J.: A novel graph-based multi-modal fusion encoder for neural machine translation. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 3025–3035. Online and Punta Cana, Dominican Republic (2020)
Google Scholar
Zellers, R., Yatskar, M., Thomson, S., Choi, Y.: Neural motifs: scene graph parsing with global context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5831–5840 (2018)
Google Scholar
Zhu, R.: Enhance multimodal transformer with external label and in-domain pretrain: hateful meme challenge winning solution. arXiv:2012.08290 (2020)

Download references

Acknowledgements

This research was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) - RO 5127/2-1 and the Vienna Science and Technology Fund (WWTF)[10.47379/VRG19008]. We thank Christos Bintsis for participating in the manual augmentation. We also thank Matthias Aßenmacher and the anonymous reviewers for their valuable feedback.

Author information

Authors and Affiliations

Faculty of Computer Science, University of Vienna, Vienna, Austria
Vasiliki Kougia, Simon Fetzel, Thomas Kirchmair, Erion Çano & Benjamin Roth
Department of Electrical and Computer Engineering, Boston University, Boston, MA, USA
Sina Moayed Baharlou
Faculty of Computer Science, Ludwig Maximilians University of Munich, Munich, Germany
Sahand Sharifzadeh

Authors

Vasiliki Kougia
View author publications
You can also search for this author in PubMed Google Scholar
Simon Fetzel
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Kirchmair
View author publications
You can also search for this author in PubMed Google Scholar
Erion Çano
View author publications
You can also search for this author in PubMed Google Scholar
Sina Moayed Baharlou
View author publications
You can also search for this author in PubMed Google Scholar
Sahand Sharifzadeh
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin Roth
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vasiliki Kougia .

Editor information

Editors and Affiliations

TU Dortmund University, Dortmund, Germany
Gernot A. Fink
Adobe, College Park, MN, USA
Rajiv Jain
Osaka Metropolitan University, Osaka, Japan
Koichi Kise
Rochester Institute of Technology, Rochester, NY, USA
Richard Zanibbi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kougia, V. et al. (2023). MemeGraphs: Linking Memes to Knowledge Graphs. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds) Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14187. Springer, Cham. https://doi.org/10.1007/978-3-031-41676-7_31

Download citation

DOI: https://doi.org/10.1007/978-3-031-41676-7_31
Published: 19 August 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-41675-0
Online ISBN: 978-3-031-41676-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

MemeGraphs: Linking Memes to Knowledge Graphs