Skip to main content
Log in

Multi-modal transformer using two-level visual features for fake news detection

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Fake news with multimedia data is ubiquitous on the Internet nowadays, and it is difficult for users to distinguish them. Therefore, it is necessary to design automatic multi-modal fake news detectors. However, the existing works make poor utilization of visual information, and do not fully consider the semantic interaction of multi-modal data. In this paper, we propose the multi-modal transformer using two-level visual features (MTTV) for fake news detection. First, we model texts and images from news uniformly as sequences that can be processed by transformer, and two-level visual features, i.e. global feature and entity-level feature, are used to improve the utilization of news images. Second, we extend the transformer model for natural language processing to multi-modal transformer which can make multi-modal data interact fully and capture the semantic relationships between them. In addition, we propose a scalable classifier to improve the classification balance of fine-grained fake news detection with the problem of class imbalance. Extensive experiments on two public datasets demonstrate that our method achieved significant performance improvement compared to the state-of-the-art methods. The source code is available at https://github.com/cqu-wb/MTTV.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Algorithm 1
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Shu K, Sliva A, Wang S, Tang J, Liu H (2017) Fake news detection on social media: A data mining perspective. ACM SIGKDD explorations newsletter 19(1):22–36

    Article  Google Scholar 

  2. Allcott H, Gentzkow M (2017) Social media and fake news in the 2016 election. J Econ Perspect 31(2):211–36

    Article  Google Scholar 

  3. Rocha YM, de Moura GA, Desidério GA, de Oliveira CH, lourenço FD, de Figueiredo Nicolete LD (2021) The impact of fake news on social media and its influence on health during the covid-19 pandemic: a systematic review. J of Public Health 9:1–10

    Google Scholar 

  4. Liu Y, Wu YF (2018) Early detection of fake news on social media through propagation path classification with recurrent and convolutional networks. In: Proceedings of the AAAI conference on artificial intelligence, vol 32

  5. Zhou X, Zafarani R (2019) Network-based fake news detection: A pattern-driven approach. ACM SIGKDD explorations newsletter 21(2):48–60

    Article  Google Scholar 

  6. Kwon S, Cha M, Jung K, Chen W, Wang Y (2013) Prominent features of rumor propagation in online social media. In: 2013 IEEE 13Th International conference on data mining, pp 1103–1108

  7. Shu K, Wang S, contents HL (2019) Beyond news the role of social context for fake news detection. In: Proceedings of the twelfth ACM international conference on web search and data mining, pp 312–320

  8. Ma J, Gao W, Wei Z, Lu Y, Wong K-F (2015) Detect rumors using time series of social context information on microblogging websites. In: Proceedings of the 24th ACM international on conference on information and knowledge management, pp 1751–1754

  9. Jin Z, Cao J, Guo H, Zhang Y, Luo J (2017) Multimodal fusion with recurrent neural networks for rumor detection on microblogs. In: Proceedings of the 25th ACM international conference on Multimedia, pp 795–816

  10. Ruchansky N, Seo S, Csi YL (2017) A hybrid deep model for fake news detection. In: Proceedings of the 2017 ACM on conference on information and knowledge management, pp 797–806

  11. Ma Jing, Gao Wei, Mitra P, Kwon S, Jansen BJ, Wong KFi, Cha M (2016) Detecting rumors from microblogs with recurrent neural networks. In: IJCAI International joint conference on artificial intelligence, pp 3818–3824

  12. Singhania S, Fernandez N, Rao A (2017) 3han: A deep neural network for fake news detection. In: International conference on neural information processing, pp 572–581

  13. Wang Y, Yang W, Ma F, Xu J, Zhong B, Deng Q, Gao J (2020) Weak supervision for fake news detection via reinforcement learning. In: Proceedings of the AAAI conference on artificial intelligence, pp 516–523

  14. Wang Y, Ma F, Jin Z, Ye Y, Xun G, Jha K, Lu S, Gao J (2018) Eann: Event adversarial neural networks for multi-modal fake news detection. In: Proceedings of the 24th acm sigkdd international conference on knowledge discovery & data mining, pp 849–857

  15. Khattar D, Goud JS, Gupta M, Mvae VV (2019) Multimodal variational autoencoder for fake news detection. In: The world wide web conference, pp 2915–2921

  16. Raj C, Meel P (2021) Convnet frameworks for multi-modal fake news detection. Appl Intell 51(11):8132–8148

    Article  Google Scholar 

  17. Yoon K (2014) Convolutional neural networks for sentence classification. In: Proceedings of the 2014 conference on empirical methods in natural language processing, pp 1746–1751

  18. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  19. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: 3Rd international conference on learning representations

  20. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770– 778

  21. Nakamura Kai, Levy Sharon, Wang William Yang (2020) Fakeddit: a new multimodal benchmark dataset for fine-grained fake news detection. In: Proceedings of the 12th language resources and evaluation conference, pp 6149–6157

  22. Ren S, He K, Girshick R, Sun J (2016) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149

    Article  Google Scholar 

  23. Pérez-Rosas V, Kleinberg B, Lefevre A, Rada M (2018) Automatic detection of fake news. In: Proceedings of the 27th international conference on computational linguistics, pp 3391–3401

  24. Boididou C, Andreadou K, Papadopoulos S, Dang-Nguyen DT, Boato G, Riegler M, ompatsiaris Y et al (2015) Verifying multimedia use at mediaeval 2015. In: Working notes proceedings of the MediaEval 2015 workshop, Wurzen, Germany, September 14-15, 2015, vol 1436. of CEUR Workshop Proceedings

  25. Gupta A, Lamba H, Kumaraguru P, Joshi A (2013) Faking sandy: characterizing and identifying fake images on twitter during hurricane sandy. In: Proceedings of the 22nd international conference on world wide web, pp 729–736

  26. Jin Z, Cao J, Zhang Y, Zhou J, Qi T (2016) Novel visual and statistical image features for microblogs news verification. IEEE Trans Multimedia 19(3):598–608

    Article  Google Scholar 

  27. Zhang H, Fang Q, Qian S, Xu C (2019) Multi-modal knowledge-aware event memory network for social media rumor detection. In: Proceedings of the 27th ACM international conference on multimedia, pp 1942–1951

  28. Wang Y, Qian S, Hu J, Fang Q, Xu C (2020) Fake news detection via knowledge-driven multimodal graph convolutional networks. In: Proceedings of the 2020 international conference on multimedia retrieval, pp 540–547

  29. Qian S, Hu J, Fang Q, Xu C (2021) Knowledge-aware multi-modal adaptive graph convolutional networks for fake news detection. ACM Trans Multimedia Comput Commun Appl (TOMM) 17(3):1–23

    Article  Google Scholar 

  30. Silva A, Luo L, Karunasekera S, Leckie C (2021) Embracing domain differences in fake news: Cross-domain fake news detection using multi-modal data. In: Proceedings of the AAAI conference on artificial intelligence, pp 557–565

  31. Zeng J, Zhang Y, Ma X (2021) Fake news detection for epidemic emergencies via deep correlations between text and images. Sustain Cities Soc 66:102652

    Article  Google Scholar 

  32. Wei Z, Pan H, Qiao L, Niu X, Dong P, Li D (2022) Cross-modal knowledge distillation in multi-modal fake news detection. In: ICASSP 2022-2022 IEEE International conference on acoustics, speech and signal processing (ICASSP), pp 4733–4737

  33. Qian S, Wang J, Hu J, Fang Q, Xu C (2021) Hierarchical multi-modal contextual attention network for fake news detection. In: Proceedings of the 44th international ACM SIGIR conference on research and development in information retrieval, pp 153– 162

  34. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008

  35. Radford A, Narasimhan K, Salimans T, Sutskever I (2018) Improving language understanding with unsupervised learning, technical report, OpenAI

  36. Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the north american chapter of the association for computational linguistics: human language technologies, pp 4171–4186

  37. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al (2021) An image is worth 16x16 words: Transformers for image recognition at scale. In: 9Th International conference on learning representations

  38. Lu J, Batra D, Parikh D, Lee S (2019) Vilbert: pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. In: Proceedings of the 33rd international conference on neural information processing systems, pp 13–23

  39. Li LH, Yatskar M, Yin D, Hsieh C-J, Chang K-W (2020) What does bert with vision look at?. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 5265–5275

  40. Su W, Zhu X, Cao Y, Li B, Lu W, Wei F, Dai J (2020) VL-BERT: Pre-training of generic visual-linguistic representations. In: 8Th International conference on learning representations

  41. Kiela D, Bhooshan S, Firooz H, Testuggine D (2019) Supervised multimodal bitransformers for classifying images and text. In: Visually grounded interaction and language (ViGIL), NeurIPS 2019 workshop

  42. Curto D, Clapés A, Selva J, Smeureanu S, Junior J, Jacques CS, Gallardo-Pujol D, Guilera G, Leiva D, Moeslund TB et al (2021) Dyadformer: a multi-modal transformer for long-range modeling of dyadic interactions. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 2177–2188

  43. Gabeur V, Sun C, Alahari K, Schmid C (2020) Multi-modal transformer for video retrieval. In: Computer vision–ECCV 2020 16th european conference, pp 214–229

  44. Messina N, Amato G, Esuli A, Falchi F, Gennaro C, Marchand-Maillet S (2021) Fine-grained visual textual alignment for cross-modal retrieval using transformer encoders. ACM Trans. Multimedia Comput, Commun, Appl (TOMM) 17(4):1–23

    Article  Google Scholar 

  45. Prakash A, Chitta K, Geiger A (June 2021) Multi-modal fusion transformer for end-to-end autonomous driving. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 7077–7087

  46. Ju X, Zhang D, Li J, Zhou G (2020) Transformer-based label set generation for multi-modal multi-label emotion detection. In: Proceedings of the 28th ACM international conference on multimedia, pp 512–520

  47. Sun H, Liu J, Chai S, Qiu Z, Lin L, Huang X, Chen Y (2021) Multi-modal adaptive fusion transformer network for the estimation of depression level. Sensors 21(14):4764

    Article  Google Scholar 

  48. Zhou B, Cui Q, Wei X-S, Chen Z-M (2020) Bbn: Bilateral-branch network with cumulative learning for long-tailed visual recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9719–9728

  49. Kang B, Xie S, Rohrbach M, Yan Z, Gordo A, Feng J, Kalantidis Y (2020) Decoupling representation and classifier for long-tailed recognition. In: 8Th International conference on learning representations

  50. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet a large-scale hierarchical image database. In: 2009 IEEE Conference on computer vision and pattern recognition, pp 248–255

  51. Kingma DP, Ba J (2015) Adam: A method for stochastic optimization. In: 3Rd International conference on learning representations

Download references

Acknowledgements

Supported by Zhejiang Lab (No. 2021KE0AB01), Open Fund of Key Laboratory of Monitoring, Evaluation and Early Warning of Territorial Spatial Planning Implementation, Ministry of Natural Resources (No. LMEE-KF2021008), Technology Innovation and Application Development Key Project of Chongqing (No. cstc2021jscx-gksbX0058), National Natural Science Foundation of China (No.62176029), and Guangxi Key Laboratory of Trusted Software (No. kx202006).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yong Feng.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, B., Feng, Y., Xiong, Xc. et al. Multi-modal transformer using two-level visual features for fake news detection. Appl Intell 53, 10429–10443 (2023). https://doi.org/10.1007/s10489-022-04055-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-04055-5

Keywords

Navigation