Impact of Transformers on Multilingual Fake News Detection for Tamil and Malayalam

Hariharan, RamakrishnaIyer LekshmiAmmal; Anand Kumar, Madasamy

doi:10.1007/978-3-031-33231-9_13

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1802))

Included in the following conference series:

International Conference on Speech and Language Technologies for Low-resource Languages

223 Accesses

Abstract

Due to the availability of the technology stack for implementing state of the art neural networks, fake news or fake information classification problems have attracted many researchers working on Natural Language Processing, machine learning, and deep learning. Currently, most works on fake news detection are available in English, which has confined its widespread usability outside the English-speaking population. As far as multilingual content is considered, the fake news classification in low-resource languages is challenging due to the unavailability of enough annotated corpus. In this work, we have studied and analyzed the impact of different transformer-based models like multilingual BERT, XLMRoBERTa, and MuRIL for the dataset created (translated) as a part of this research on multilingual low-resource fake news classification. We have done various experiments, including language-specific and different models, to see the impact of the models. We also offer the multilingual dataset in Tamil and Malayalam, which are from multiple domains that could be useful for research in this direction. We have made the datasets and code available in Github (https://github.com/hariharanrl/Multilingual_Fake_News).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

De, A., Bandyopadhyay, D., Gain, B., Ekbal, A.: A transformer-based approach to multilingual fake news detection in low-resource languages. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 21(1), 1–20 (2022). https://doi.org/10.1145/3472619
Article Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. vol. 1 (Long and Short Papers), Minneapolis, Minnesota, pp. 4171–4186. Association for Computational Linguistics (Jun 2019). https://doi.org/10.18653/v1/N19-1423, https://aclanthology.org/N19-1423
Hanselowski, A., et al.: A retrospective analysis of the fake news challenge stance-detection task. In: Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, New Mexico, USA, pp. 1859–1874. Association for Computational Linguistics (Aug 2018). https://aclanthology.org/C18-1158
Hayes, D.: Political Science Quarterly. 124(3), 560–562 (2009). http://www.jstor.org/stable/25655715
Khanuja, S., et al.: MuRIL: multilingual representations for indian languages. arXiv preprint arXiv:2103.10730 (2021)
LekshmiAmmal, H.R., Madasamy, A.K.: NITK_NLP at checkThat! 2021: Ensemble transformer model for fake news classification. In: CLEF (Working Notes), pp. 603–611 (2021)
Google Scholar
LekshmiAmmal, H.R., Madasamy, A.K.: NITK-IT NLP at checkthat! 2022: Window based approach for fake news detection using transformers (2022)
Google Scholar
Liu, Y., et al.: RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
Mehta, D., Dwivedi, A., Patra, A., Anand Kumar, M.: A transformer-based architecture for fake news classification. Soc. Netw. Anal. Min. 11(1), 1–12 (2021)
Article Google Scholar
Pérez-Rosas, V., Kleinberg, B., Lefevre, A., Mihalcea, R.: Automatic detection of fake news. In: Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, New Mexico, USA, pp. 3391–3401. Association for Computational Linguistics (Aug 2018). https://aclanthology.org/C18-1287
Shahi, G.K., Nandini, D.: FakeCOVID-A multilingual cross-domain fact check news dataset for COVID-19
Google Scholar
Shu, K., Sliva, A., Wang, S., Tang, J., Liu, H.: Fake news detection on social media: a data mining perspective. ACM SIGKDD Explor. Newslett. 19(1), 22–36 (2017)
Article Google Scholar
Singhal, S., Shah, R.R., Kumaraguru, P.: FactDrill: a data repository of fact-checked social media content to study fake news incidents in India. In: Proceedings of the International AAAI Conference on Web and Social Media. vol. 16(1), pp. 1322–1331 (May 2022). https://doi.org/10.1609/icwsm.v16i1.19384, https://ojs.aaai.org/index.php/ICWSM/article/view/19384
Tagarev, A., Bozhanova, K., Nikolova-Koleva, I., Ivanov, I.: Tackling multilinguality and internationality in fake news. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pp. 1380–1386. INCOMA Ltd., Held Online (Sep 2021). https://aclanthology.org/2021.ranlp-1.154
Thorne, J., Vlachos, A., Christodoulopoulos, C., Mittal, A.: FEVER: a large-scale dataset for fact extraction and VERification. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Vol. 1 (Long Papers), New Orleans, Louisiana, pp. 809–819. Association for Computational Linguistics (Jun 2018). https://doi.org/10.18653/v1/N18-1074, https://aclanthology.org/N18-1074
Vargas, F., Benevenuto, F., Pardo, T.: Toward discourse-aware models for multilingual fake news detection. In: Proceedings of the Student Research Workshop Associated with RANLP 2021, pp. 210–218. INCOMA Ltd., Online (Sep 2021). https://aclanthology.org/2021.ranlp-srw.29
Vosoughi, S., Roy, D., Aral, S.: The spread of true and false news online. Science 359(6380), 1146–1151 (2018). https://doi.org/10.1126/science.aap9559, https://www.science.org/doi/abs/10.1126/science.aap9559
Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: a multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461 (2018)
Wu, Y., et al.: Google’s neural machine translation system: Bridging the gap between human and machine translation (2016). https://doi.org/10.48550/ARXIV.1609.08144, https://arxiv.org/abs/1609.08144
Zhang, X., Ghorbani, A.A.: An overview of online fake news: characterization, detection, and discussion. Inf. Process. Manage. 57(2), 102025 (2020). https://doi.org/10.1016/j.ipm.2019.03.004, https://www.sciencedirect.com/science/article/pii/S0306457318306794
Zhou, X., Zafarani, R.: A survey of fake news: fundamental theories, detection methods, and opportunities. ACM Comput. Surv. 53(5) (sep 2020). https://doi.org/10.1145/3395046, https://doi.org/10.1145/3395046

Download references

Author information

Authors and Affiliations

Department of Information Technology, National Institute of Technology Karnataka, Surathkal, India
RamakrishnaIyer LekshmiAmmal Hariharan & Madasamy Anand Kumar

Authors

RamakrishnaIyer LekshmiAmmal Hariharan
View author publications
You can also search for this author in PubMed Google Scholar
Madasamy Anand Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to RamakrishnaIyer LekshmiAmmal Hariharan .

Editor information

Editors and Affiliations

National Institute of Technology Karnataka, Mangalore, India
Anand Kumar M
National University of Ireland, Galway, Ireland
Bharathi Raja Chakravarthi
Sri Sivasubramaniya Nadar College of Engineering, Kalavakkam, India
Bharathi B
National University of Ireland, Galway, Ireland
Colm O’Riordan
Indian Institute of Technology Madras, Chennai, India
Hema Murthy
Sri Sivasubramaniya Nadar College of Engineering, Kalavakkam, India
Thenmozhi Durairaj
University of Hildesheim, Hildesheim, Germany
Thomas Mandl

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hariharan, R.L., Anand Kumar, M. (2023). Impact of Transformers on Multilingual Fake News Detection for Tamil and Malayalam. In: M, A.K., et al. Speech and Language Technologies for Low-Resource Languages . SPELLL 2022. Communications in Computer and Information Science, vol 1802. Springer, Cham. https://doi.org/10.1007/978-3-031-33231-9_13

Download citation

DOI: https://doi.org/10.1007/978-3-031-33231-9_13
Published: 29 May 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-33230-2
Online ISBN: 978-3-031-33231-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Impact of Transformers on Multilingual Fake News Detection for Tamil and Malayalam