Skip to main content

Impact of Transformers on Multilingual Fake News Detection for Tamil and Malayalam

  • Conference paper
  • First Online:
Speech and Language Technologies for Low-Resource Languages (SPELLL 2022)

Abstract

Due to the availability of the technology stack for implementing state of the art neural networks, fake news or fake information classification problems have attracted many researchers working on Natural Language Processing, machine learning, and deep learning. Currently, most works on fake news detection are available in English, which has confined its widespread usability outside the English-speaking population. As far as multilingual content is considered, the fake news classification in low-resource languages is challenging due to the unavailability of enough annotated corpus. In this work, we have studied and analyzed the impact of different transformer-based models like multilingual BERT, XLMRoBERTa, and MuRIL for the dataset created (translated) as a part of this research on multilingual low-resource fake news classification. We have done various experiments, including language-specific and different models, to see the impact of the models. We also offer the multilingual dataset in Tamil and Malayalam, which are from multiple domains that could be useful for research in this direction. We have made the datasets and code available in Github (https://github.com/hariharanrl/Multilingual_Fake_News).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.buzzfeednews.com/article/craigsilverman/viral-fake-election-news-outperformed-real-news-on-facebook.

  2. 2.

    https://huggingface.co/docs/tokenizers/python/latest/.

References

  1. De, A., Bandyopadhyay, D., Gain, B., Ekbal, A.: A transformer-based approach to multilingual fake news detection in low-resource languages. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 21(1), 1–20 (2022). https://doi.org/10.1145/3472619

    Article  Google Scholar 

  2. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. vol. 1 (Long and Short Papers), Minneapolis, Minnesota, pp. 4171–4186. Association for Computational Linguistics (Jun 2019). https://doi.org/10.18653/v1/N19-1423, https://aclanthology.org/N19-1423

  3. Hanselowski, A., et al.: A retrospective analysis of the fake news challenge stance-detection task. In: Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, New Mexico, USA, pp. 1859–1874. Association for Computational Linguistics (Aug 2018). https://aclanthology.org/C18-1158

  4. Hayes, D.: Political Science Quarterly. 124(3), 560–562 (2009). http://www.jstor.org/stable/25655715

  5. Khanuja, S., et al.: MuRIL: multilingual representations for indian languages. arXiv preprint arXiv:2103.10730 (2021)

  6. LekshmiAmmal, H.R., Madasamy, A.K.: NITK_NLP at checkThat! 2021: Ensemble transformer model for fake news classification. In: CLEF (Working Notes), pp. 603–611 (2021)

    Google Scholar 

  7. LekshmiAmmal, H.R., Madasamy, A.K.: NITK-IT NLP at checkthat! 2022: Window based approach for fake news detection using transformers (2022)

    Google Scholar 

  8. Liu, Y., et al.: RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)

  9. Mehta, D., Dwivedi, A., Patra, A., Anand Kumar, M.: A transformer-based architecture for fake news classification. Soc. Netw. Anal. Min. 11(1), 1–12 (2021)

    Article  Google Scholar 

  10. Pérez-Rosas, V., Kleinberg, B., Lefevre, A., Mihalcea, R.: Automatic detection of fake news. In: Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, New Mexico, USA, pp. 3391–3401. Association for Computational Linguistics (Aug 2018). https://aclanthology.org/C18-1287

  11. Shahi, G.K., Nandini, D.: FakeCOVID-A multilingual cross-domain fact check news dataset for COVID-19

    Google Scholar 

  12. Shu, K., Sliva, A., Wang, S., Tang, J., Liu, H.: Fake news detection on social media: a data mining perspective. ACM SIGKDD Explor. Newslett. 19(1), 22–36 (2017)

    Article  Google Scholar 

  13. Singhal, S., Shah, R.R., Kumaraguru, P.: FactDrill: a data repository of fact-checked social media content to study fake news incidents in India. In: Proceedings of the International AAAI Conference on Web and Social Media. vol. 16(1), pp. 1322–1331 (May 2022). https://doi.org/10.1609/icwsm.v16i1.19384, https://ojs.aaai.org/index.php/ICWSM/article/view/19384

  14. Tagarev, A., Bozhanova, K., Nikolova-Koleva, I., Ivanov, I.: Tackling multilinguality and internationality in fake news. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pp. 1380–1386. INCOMA Ltd., Held Online (Sep 2021). https://aclanthology.org/2021.ranlp-1.154

  15. Thorne, J., Vlachos, A., Christodoulopoulos, C., Mittal, A.: FEVER: a large-scale dataset for fact extraction and VERification. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Vol. 1 (Long Papers), New Orleans, Louisiana, pp. 809–819. Association for Computational Linguistics (Jun 2018). https://doi.org/10.18653/v1/N18-1074, https://aclanthology.org/N18-1074

  16. Vargas, F., Benevenuto, F., Pardo, T.: Toward discourse-aware models for multilingual fake news detection. In: Proceedings of the Student Research Workshop Associated with RANLP 2021, pp. 210–218. INCOMA Ltd., Online (Sep 2021). https://aclanthology.org/2021.ranlp-srw.29

  17. Vosoughi, S., Roy, D., Aral, S.: The spread of true and false news online. Science 359(6380), 1146–1151 (2018). https://doi.org/10.1126/science.aap9559, https://www.science.org/doi/abs/10.1126/science.aap9559

  18. Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: a multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461 (2018)

  19. Wu, Y., et al.: Google’s neural machine translation system: Bridging the gap between human and machine translation (2016). https://doi.org/10.48550/ARXIV.1609.08144, https://arxiv.org/abs/1609.08144

  20. Zhang, X., Ghorbani, A.A.: An overview of online fake news: characterization, detection, and discussion. Inf. Process. Manage. 57(2), 102025 (2020). https://doi.org/10.1016/j.ipm.2019.03.004, https://www.sciencedirect.com/science/article/pii/S0306457318306794

  21. Zhou, X., Zafarani, R.: A survey of fake news: fundamental theories, detection methods, and opportunities. ACM Comput. Surv. 53(5) (sep 2020). https://doi.org/10.1145/3395046, https://doi.org/10.1145/3395046

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to RamakrishnaIyer LekshmiAmmal Hariharan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hariharan, R.L., Anand Kumar, M. (2023). Impact of Transformers on Multilingual Fake News Detection for Tamil and Malayalam. In: M, A.K., et al. Speech and Language Technologies for Low-Resource Languages . SPELLL 2022. Communications in Computer and Information Science, vol 1802. Springer, Cham. https://doi.org/10.1007/978-3-031-33231-9_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-33231-9_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-33230-2

  • Online ISBN: 978-3-031-33231-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics