Abstract
Due to the availability of the technology stack for implementing state of the art neural networks, fake news or fake information classification problems have attracted many researchers working on Natural Language Processing, machine learning, and deep learning. Currently, most works on fake news detection are available in English, which has confined its widespread usability outside the English-speaking population. As far as multilingual content is considered, the fake news classification in low-resource languages is challenging due to the unavailability of enough annotated corpus. In this work, we have studied and analyzed the impact of different transformer-based models like multilingual BERT, XLMRoBERTa, and MuRIL for the dataset created (translated) as a part of this research on multilingual low-resource fake news classification. We have done various experiments, including language-specific and different models, to see the impact of the models. We also offer the multilingual dataset in Tamil and Malayalam, which are from multiple domains that could be useful for research in this direction. We have made the datasets and code available in Github (https://github.com/hariharanrl/Multilingual_Fake_News).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
De, A., Bandyopadhyay, D., Gain, B., Ekbal, A.: A transformer-based approach to multilingual fake news detection in low-resource languages. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 21(1), 1–20 (2022). https://doi.org/10.1145/3472619
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. vol. 1 (Long and Short Papers), Minneapolis, Minnesota, pp. 4171–4186. Association for Computational Linguistics (Jun 2019). https://doi.org/10.18653/v1/N19-1423, https://aclanthology.org/N19-1423
Hanselowski, A., et al.: A retrospective analysis of the fake news challenge stance-detection task. In: Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, New Mexico, USA, pp. 1859–1874. Association for Computational Linguistics (Aug 2018). https://aclanthology.org/C18-1158
Hayes, D.: Political Science Quarterly. 124(3), 560–562 (2009). http://www.jstor.org/stable/25655715
Khanuja, S., et al.: MuRIL: multilingual representations for indian languages. arXiv preprint arXiv:2103.10730 (2021)
LekshmiAmmal, H.R., Madasamy, A.K.: NITK_NLP at checkThat! 2021: Ensemble transformer model for fake news classification. In: CLEF (Working Notes), pp. 603–611 (2021)
LekshmiAmmal, H.R., Madasamy, A.K.: NITK-IT NLP at checkthat! 2022: Window based approach for fake news detection using transformers (2022)
Liu, Y., et al.: RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
Mehta, D., Dwivedi, A., Patra, A., Anand Kumar, M.: A transformer-based architecture for fake news classification. Soc. Netw. Anal. Min. 11(1), 1–12 (2021)
Pérez-Rosas, V., Kleinberg, B., Lefevre, A., Mihalcea, R.: Automatic detection of fake news. In: Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, New Mexico, USA, pp. 3391–3401. Association for Computational Linguistics (Aug 2018). https://aclanthology.org/C18-1287
Shahi, G.K., Nandini, D.: FakeCOVID-A multilingual cross-domain fact check news dataset for COVID-19
Shu, K., Sliva, A., Wang, S., Tang, J., Liu, H.: Fake news detection on social media: a data mining perspective. ACM SIGKDD Explor. Newslett. 19(1), 22–36 (2017)
Singhal, S., Shah, R.R., Kumaraguru, P.: FactDrill: a data repository of fact-checked social media content to study fake news incidents in India. In: Proceedings of the International AAAI Conference on Web and Social Media. vol. 16(1), pp. 1322–1331 (May 2022). https://doi.org/10.1609/icwsm.v16i1.19384, https://ojs.aaai.org/index.php/ICWSM/article/view/19384
Tagarev, A., Bozhanova, K., Nikolova-Koleva, I., Ivanov, I.: Tackling multilinguality and internationality in fake news. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pp. 1380–1386. INCOMA Ltd., Held Online (Sep 2021). https://aclanthology.org/2021.ranlp-1.154
Thorne, J., Vlachos, A., Christodoulopoulos, C., Mittal, A.: FEVER: a large-scale dataset for fact extraction and VERification. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Vol. 1 (Long Papers), New Orleans, Louisiana, pp. 809–819. Association for Computational Linguistics (Jun 2018). https://doi.org/10.18653/v1/N18-1074, https://aclanthology.org/N18-1074
Vargas, F., Benevenuto, F., Pardo, T.: Toward discourse-aware models for multilingual fake news detection. In: Proceedings of the Student Research Workshop Associated with RANLP 2021, pp. 210–218. INCOMA Ltd., Online (Sep 2021). https://aclanthology.org/2021.ranlp-srw.29
Vosoughi, S., Roy, D., Aral, S.: The spread of true and false news online. Science 359(6380), 1146–1151 (2018). https://doi.org/10.1126/science.aap9559, https://www.science.org/doi/abs/10.1126/science.aap9559
Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: a multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461 (2018)
Wu, Y., et al.: Google’s neural machine translation system: Bridging the gap between human and machine translation (2016). https://doi.org/10.48550/ARXIV.1609.08144, https://arxiv.org/abs/1609.08144
Zhang, X., Ghorbani, A.A.: An overview of online fake news: characterization, detection, and discussion. Inf. Process. Manage. 57(2), 102025 (2020). https://doi.org/10.1016/j.ipm.2019.03.004, https://www.sciencedirect.com/science/article/pii/S0306457318306794
Zhou, X., Zafarani, R.: A survey of fake news: fundamental theories, detection methods, and opportunities. ACM Comput. Surv. 53(5) (sep 2020). https://doi.org/10.1145/3395046, https://doi.org/10.1145/3395046
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Hariharan, R.L., Anand Kumar, M. (2023). Impact of Transformers on Multilingual Fake News Detection for Tamil and Malayalam. In: M, A.K., et al. Speech and Language Technologies for Low-Resource Languages . SPELLL 2022. Communications in Computer and Information Science, vol 1802. Springer, Cham. https://doi.org/10.1007/978-3-031-33231-9_13
Download citation
DOI: https://doi.org/10.1007/978-3-031-33231-9_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-33230-2
Online ISBN: 978-3-031-33231-9
eBook Packages: Computer ScienceComputer Science (R0)