ABSTRACT
The exponential growth of social media platforms has permitted people to connect worldwide. However, it has also fueled the elevation of several harmful and abusive content on the Internet. Repeated exposure to abusive content may lead to psychological effects on the target users. Thus it is necessary to detect such abusive content in all forms to keep these platforms safe and healthy. So far, several works have been done for abusive speech detection; however, most of these are text-based. Yet, social media contents are often multimodal, comprising text, images, videos, etc. Internet memes have recently emerged as a predominant mode of content shared on social media and are used to express vitriol or harm toward others. Hence it is essential to detect such abusive memes. Although several works have been done for abusive/harmful meme detection, most of these are in English with only a very few extending to non-English datasets. Therefore, one of the immediate solutions is to detect abusive memes in one language and transfer them to other languages. This work explores several model transfer techniques to bridge the gap by creating various baseline models.
- Sai Saketh Aluru, Binny Mathew, Punyajoy Saha, and Animesh Mukherjee. 2020. Deep learning models for multilingual hate speech detection. arXiv preprint arXiv:2004.06465(2020).Google Scholar
- Somnath Banerjee, Maulindu Sarkar, Nancy Agrawal, Punyajoy Saha, and Mithun Das. 2021. Exploring transformer based models to identify hate speech and offensive content in english and indo-aryan languages. arXiv preprint arXiv:2111.13974(2021).Google Scholar
- Mohit Chandra, Dheeraj Pailla, Himanshu Bhatia, Aadilmehdi Sanchawala, Manish Gupta, Manish Shrivastava, and Ponnurangam Kumaraguru. 2021. “Subverting the Jewtocracy”: Online Antisemitism Detection Using Multimodal Deep Learning. In 13th ACM Web Science Conference 2021. 148–157.Google Scholar
- Mithun Das, Somnath Banerjee, and Animesh Mukherjee. 2022. Data Bootstrapping Approaches to Improve Low Resource Abusive Language Detection for Indic Languages. 32–42. https://doi.org/10.1145/3511095.3531277Google ScholarDigital Library
- Mithun Das, Somnath Banerjee, and Punyajoy Saha. 2021. Abusive and threatening language detection in urdu using boosting based and bert based models: A comparative approach. arXiv preprint arXiv:2111.14830(2021).Google Scholar
- Mithun Das, Binny Mathew, Punyajoy Saha, Pawan Goyal, and Animesh Mukherjee. 2020. Hate speech in online social media. ACM SIGWEB NewsletterAutumn (2020), 1–8.Google Scholar
- Thomas Davidson, Dana Warmsley, Michael Macy, and Ingmar Weber. 2017. Automated hate speech detection and the problem of offensive language. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 11.Google ScholarCross Ref
- J. Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL.Google Scholar
- Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929(2020).Google Scholar
- Antigoni Maria Founta, Constantinos Djouvas, Despoina Chatzakou, Ilias Leontiadis, Jeremy Blackburn, Gianluca Stringhini, Athena Vakali, Michael Sirivianos, and Nicolas Kourtellis. 2018. Large scale crowdsourcing and characterization of twitter abusive behavior. In Twelfth International AAAI Conference on Web and Social Media.Google ScholarCross Ref
- Raul Gomez, Jaume Gibert, Lluis Gomez, and Dimosthenis Karatzas. 2020. Exploring hate speech detection in multimodal publications. In Proceedings of the IEEE/CVF winter conference on applications of computer vision. 1470–1478.Google ScholarCross Ref
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.Google ScholarCross Ref
- Nicola F Johnson, R Leahy, N Johnson Restrepo, Nicolas Velasquez, Ming Zheng, P Manrique, P Devkota, and Stefan Wuchty. 2019. Hidden resilience and adaptive dynamics of the global online hate ecology. Nature 573, 7773 (2019), 261–265.Google Scholar
- Simran Khanuja, Diksha Bansal, Sarvesh Mehtani, Savya Khosla, Atreyee Dey, Balaji Gopalan, Dilip Kumar Margam, Pooja Aggarwal, Rajiv Teja Nagipogu, Shachi Dave, 2021. Muril: Multilingual representations for indian languages. arXiv preprint arXiv:2103.10730(2021).Google Scholar
- Gokul Karthik Kumar and Karthik Nanadakumar. 2022. Hate-CLIPper: Multimodal Hateful Meme Classification based on Cross-modal Interaction of CLIP Features. arXiv preprint arXiv:2210.05916(2022).Google Scholar
- Ritesh Kumar, Atul Kr Ojha, Shervin Malmasi, and Marcos Zampieri. 2018. Benchmarking aggression identification in social media. In Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018). 1–11.Google Scholar
- Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, and Kai-Wei Chang. 2019. Visualbert: A simple and performant baseline for vision and language. arXiv preprint arXiv:1908.03557(2019).Google Scholar
- Jiasen Lu, Dhruv Batra, Devi Parikh, and Stefan Lee. 2019. Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. Advances in neural information processing systems 32 (2019).Google Scholar
- Krishanu Maity, Prince Jha, Sriparna Saha, and Pushpak Bhattacharyya. 2022. A Multitask Framework for Sentiment, Emotion and Sarcasm Aware Cyberbullying Detection from Multi-Modal Code-Mixed Memes. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (Madrid, Spain) (SIGIR ’22). Association for Computing Machinery, New York, NY, USA, 1739–1749. https://doi.org/10.1145/3477495.3531925Google ScholarDigital Library
- Arijit Nag, Bidisha Samanta, Animesh Mukherjee, Niloy Ganguly, and Soumen Chakrabarti. 2021. A Data Bootstrapping Recipe for Low-Resource Multilingual Relation Classification. In Proceedings of the 25th Conference on Computational Natural Language Learning. 575–587.Google ScholarCross Ref
- Casey Newton. 2019. The terror queue. https://www.theverge.com/2019/12/16/21021005/google-youtube-moderators-ptsd-accenture-violent-disturbing-content-interviews-videoGoogle Scholar
- Shraman Pramanick, Dimitar Dimitrov, Rituparna Mukherjee, Shivam Sharma, Md Shad Akhtar, Preslav Nakov, and Tanmoy Chakraborty. 2021. Detecting Harmful Memes and Their Targets. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. 2783–2796.Google ScholarCross Ref
- Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, 2021. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning. PMLR, 8748–8763.Google Scholar
- Hammad Rizwan, Muhammad Haroon Shakeel, and Asim Karim. 2020. Hate-speech and offensive language detection in roman Urdu. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2512–2522.Google ScholarCross Ref
- Nauros Romim, Mosahed Ahmed, Hriteshwar Talukder, and Md Saiful Islam. 2021. Hate speech detection in the bengali language: A dataset and its baseline evaluation. In Proceedings of International Joint Conference on Advances in Computational Intelligence. Springer, 457–468.Google ScholarCross Ref
- Benet Oriol Sabat, Cristian Canton Ferrer, and Xavier Giro-i Nieto. 2019. Hate speech in pixels: Detection of offensive memes towards automatic moderation. arXiv preprint arXiv:1910.02334(2019).Google Scholar
- Shivam Sharma, Firoj Alam, Md Akhtar, Dimitar Dimitrov, Giovanni Da San Martino, Hamed Firooz, Alon Halevy, Fabrizio Silvestri, Preslav Nakov, Tanmoy Chakraborty, 2022. Detecting and Understanding Harmful Memes: A Survey. arXiv preprint arXiv:2205.04274(2022).Google Scholar
- Limor Shifman. 2013. Memes in digital culture. MIT press.Google ScholarDigital Library
- N Statt. 2017. YouTube is facing a full-scale advertising boycott over hate speech. The Verge (2017).Google Scholar
- Shardul Suryawanshi, Bharathi Raja Chakravarthi, Mihael Arcan, and Paul Buitelaar. 2020. Multimodal meme dataset (MultiOFF) for identifying offensive content in image and text. In Proceedings of the second workshop on trolling, aggression and cyberbullying. 32–41.Google Scholar
- Steve Durairaj Swamy, Anupam Jamatia, and Björn Gambäck. 2019. Studying generalisability across abusive language detection datasets. In Proceedings of the 23rd conference on computational natural language learning (CoNLL). 940–950.Google ScholarCross Ref
- Janikke Solstad Vedeler, Terje Olsen, and John Eriksen. 2019. Hate speech harms: a social justice discussion of disabled Norwegians’ experiences. Disability & Society 34, 3 (2019), 368–383.Google ScholarCross Ref
- Savvas Zannettou, Tristan Caulfield, Jeremy Blackburn, Emiliano De Cristofaro, Michael Sirivianos, Gianluca Stringhini, and Guillermo Suarez-Tangil. 2018. On the Origins of Memes by Means of Fringe Web Communities. In Proceedings of the Internet Measurement Conference 2018 (Boston, MA, USA) (IMC ’18). Association for Computing Machinery, New York, NY, USA, 188–202. https://doi.org/10.1145/3278532.3278550Google ScholarDigital Library
Index Terms
- Transfer Learning for Multilingual Abusive Meme Detection
Recommendations
User-aware multilingual abusive content detection in social media
AbstractDespite growing efforts to halt distasteful content on social media, multilingualism has added a new dimension to this problem. The scarcity of resources makes the challenge even greater when it comes to low-resource languages. This work focuses ...
Highlights- We propose a multilingual abuse detection method for low-resource Indic languages.
- User-history, post-affinity, and textual modality help to identify abusive content.
- Deep neural networks learn representations of social and text ...
A Statistical Learning Approach to Detect Abusive Twitter Accounts
ICCDA '17: Proceedings of the International Conference on Compute and Data AnalysisThe increased use of social media has motivated spammers to post their malicious activities on social network sites. Some of these spammers use adult content to further the distribution of their malicious activities. Moreover, the extensive number of ...
Data Bootstrapping Approaches to Improve Low Resource Abusive Language Detection for Indic Languages
HT '22: Proceedings of the 33rd ACM Conference on Hypertext and Social MediaAbusive language is a growing concern in many social media platforms. Repeated exposure to abusive speech has created physiological effects on the target users. Thus, the problem of abusive language should be addressed in all forms for online peace and ...
Comments