Multi-modal transformer using two-level visual features for fake news detection

Wang, Bin; Feng, Yong; Xiong, Xian-cai; Wang, Yong-heng; Qiang, Bao-hua

doi:10.1007/s10489-022-04055-5

Multi-modal transformer using two-level visual features for fake news detection

Published: 18 August 2022

Volume 53, pages 10429–10443, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Bin Wang¹,
Yong Feng ORCID: orcid.org/0000-0002-8820-8388¹,
Xian-cai Xiong^2,3,
Yong-heng Wang⁴ &
…
Bao-hua Qiang⁵

1033 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

Fake news with multimedia data is ubiquitous on the Internet nowadays, and it is difficult for users to distinguish them. Therefore, it is necessary to design automatic multi-modal fake news detectors. However, the existing works make poor utilization of visual information, and do not fully consider the semantic interaction of multi-modal data. In this paper, we propose the multi-modal transformer using two-level visual features (MTTV) for fake news detection. First, we model texts and images from news uniformly as sequences that can be processed by transformer, and two-level visual features, i.e. global feature and entity-level feature, are used to improve the utilization of news images. Second, we extend the transformer model for natural language processing to multi-modal transformer which can make multi-modal data interact fully and capture the semantic relationships between them. In addition, we propose a scalable classifier to improve the classification balance of fine-grained fake news detection with the problem of class imbalance. Extensive experiments on two public datasets demonstrate that our method achieved significant performance improvement compared to the state-of-the-art methods. The source code is available at https://github.com/cqu-wb/MTTV.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multimodal Fake News Detection on Fakeddit Dataset Using Transformer-Based Architectures

Instance-Guided Multi-modal Fake News Detection with Dynamic Intra- and Inter-modality Fusion

$$\mathsf {SAFE}$$ : Similarity-Aware Multi-modal Fake News Detection

References

Shu K, Sliva A, Wang S, Tang J, Liu H (2017) Fake news detection on social media: A data mining perspective. ACM SIGKDD explorations newsletter 19(1):22–36
Article Google Scholar
Allcott H, Gentzkow M (2017) Social media and fake news in the 2016 election. J Econ Perspect 31(2):211–36
Article Google Scholar
Rocha YM, de Moura GA, Desidério GA, de Oliveira CH, lourenço FD, de Figueiredo Nicolete LD (2021) The impact of fake news on social media and its influence on health during the covid-19 pandemic: a systematic review. J of Public Health 9:1–10
Google Scholar
Liu Y, Wu YF (2018) Early detection of fake news on social media through propagation path classification with recurrent and convolutional networks. In: Proceedings of the AAAI conference on artificial intelligence, vol 32
Zhou X, Zafarani R (2019) Network-based fake news detection: A pattern-driven approach. ACM SIGKDD explorations newsletter 21(2):48–60
Article Google Scholar
Kwon S, Cha M, Jung K, Chen W, Wang Y (2013) Prominent features of rumor propagation in online social media. In: 2013 IEEE 13Th International conference on data mining, pp 1103–1108
Shu K, Wang S, contents HL (2019) Beyond news the role of social context for fake news detection. In: Proceedings of the twelfth ACM international conference on web search and data mining, pp 312–320
Ma J, Gao W, Wei Z, Lu Y, Wong K-F (2015) Detect rumors using time series of social context information on microblogging websites. In: Proceedings of the 24th ACM international on conference on information and knowledge management, pp 1751–1754
Jin Z, Cao J, Guo H, Zhang Y, Luo J (2017) Multimodal fusion with recurrent neural networks for rumor detection on microblogs. In: Proceedings of the 25th ACM international conference on Multimedia, pp 795–816
Ruchansky N, Seo S, Csi YL (2017) A hybrid deep model for fake news detection. In: Proceedings of the 2017 ACM on conference on information and knowledge management, pp 797–806
Ma Jing, Gao Wei, Mitra P, Kwon S, Jansen BJ, Wong KFi, Cha M (2016) Detecting rumors from microblogs with recurrent neural networks. In: IJCAI International joint conference on artificial intelligence, pp 3818–3824
Singhania S, Fernandez N, Rao A (2017) 3han: A deep neural network for fake news detection. In: International conference on neural information processing, pp 572–581
Wang Y, Yang W, Ma F, Xu J, Zhong B, Deng Q, Gao J (2020) Weak supervision for fake news detection via reinforcement learning. In: Proceedings of the AAAI conference on artificial intelligence, pp 516–523
Wang Y, Ma F, Jin Z, Ye Y, Xun G, Jha K, Lu S, Gao J (2018) Eann: Event adversarial neural networks for multi-modal fake news detection. In: Proceedings of the 24th acm sigkdd international conference on knowledge discovery & data mining, pp 849–857
Khattar D, Goud JS, Gupta M, Mvae VV (2019) Multimodal variational autoencoder for fake news detection. In: The world wide web conference, pp 2915–2921
Raj C, Meel P (2021) Convnet frameworks for multi-modal fake news detection. Appl Intell 51(11):8132–8148
Article Google Scholar
Yoon K (2014) Convolutional neural networks for sentence classification. In: Proceedings of the 2014 conference on empirical methods in natural language processing, pp 1746–1751
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: 3Rd international conference on learning representations
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770– 778
Nakamura Kai, Levy Sharon, Wang William Yang (2020) Fakeddit: a new multimodal benchmark dataset for fine-grained fake news detection. In: Proceedings of the 12th language resources and evaluation conference, pp 6149–6157
Ren S, He K, Girshick R, Sun J (2016) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Article Google Scholar
Pérez-Rosas V, Kleinberg B, Lefevre A, Rada M (2018) Automatic detection of fake news. In: Proceedings of the 27th international conference on computational linguistics, pp 3391–3401
Boididou C, Andreadou K, Papadopoulos S, Dang-Nguyen DT, Boato G, Riegler M, ompatsiaris Y et al (2015) Verifying multimedia use at mediaeval 2015. In: Working notes proceedings of the MediaEval 2015 workshop, Wurzen, Germany, September 14-15, 2015, vol 1436. of CEUR Workshop Proceedings
Gupta A, Lamba H, Kumaraguru P, Joshi A (2013) Faking sandy: characterizing and identifying fake images on twitter during hurricane sandy. In: Proceedings of the 22nd international conference on world wide web, pp 729–736
Jin Z, Cao J, Zhang Y, Zhou J, Qi T (2016) Novel visual and statistical image features for microblogs news verification. IEEE Trans Multimedia 19(3):598–608
Article Google Scholar
Zhang H, Fang Q, Qian S, Xu C (2019) Multi-modal knowledge-aware event memory network for social media rumor detection. In: Proceedings of the 27th ACM international conference on multimedia, pp 1942–1951
Wang Y, Qian S, Hu J, Fang Q, Xu C (2020) Fake news detection via knowledge-driven multimodal graph convolutional networks. In: Proceedings of the 2020 international conference on multimedia retrieval, pp 540–547
Qian S, Hu J, Fang Q, Xu C (2021) Knowledge-aware multi-modal adaptive graph convolutional networks for fake news detection. ACM Trans Multimedia Comput Commun Appl (TOMM) 17(3):1–23
Article Google Scholar
Silva A, Luo L, Karunasekera S, Leckie C (2021) Embracing domain differences in fake news: Cross-domain fake news detection using multi-modal data. In: Proceedings of the AAAI conference on artificial intelligence, pp 557–565
Zeng J, Zhang Y, Ma X (2021) Fake news detection for epidemic emergencies via deep correlations between text and images. Sustain Cities Soc 66:102652
Article Google Scholar
Wei Z, Pan H, Qiao L, Niu X, Dong P, Li D (2022) Cross-modal knowledge distillation in multi-modal fake news detection. In: ICASSP 2022-2022 IEEE International conference on acoustics, speech and signal processing (ICASSP), pp 4733–4737
Qian S, Wang J, Hu J, Fang Q, Xu C (2021) Hierarchical multi-modal contextual attention network for fake news detection. In: Proceedings of the 44th international ACM SIGIR conference on research and development in information retrieval, pp 153– 162
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008
Radford A, Narasimhan K, Salimans T, Sutskever I (2018) Improving language understanding with unsupervised learning, technical report, OpenAI
Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the north american chapter of the association for computational linguistics: human language technologies, pp 4171–4186
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al (2021) An image is worth 16x16 words: Transformers for image recognition at scale. In: 9Th International conference on learning representations
Lu J, Batra D, Parikh D, Lee S (2019) Vilbert: pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. In: Proceedings of the 33rd international conference on neural information processing systems, pp 13–23
Li LH, Yatskar M, Yin D, Hsieh C-J, Chang K-W (2020) What does bert with vision look at?. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 5265–5275
Su W, Zhu X, Cao Y, Li B, Lu W, Wei F, Dai J (2020) VL-BERT: Pre-training of generic visual-linguistic representations. In: 8Th International conference on learning representations
Kiela D, Bhooshan S, Firooz H, Testuggine D (2019) Supervised multimodal bitransformers for classifying images and text. In: Visually grounded interaction and language (ViGIL), NeurIPS 2019 workshop
Curto D, Clapés A, Selva J, Smeureanu S, Junior J, Jacques CS, Gallardo-Pujol D, Guilera G, Leiva D, Moeslund TB et al (2021) Dyadformer: a multi-modal transformer for long-range modeling of dyadic interactions. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 2177–2188
Gabeur V, Sun C, Alahari K, Schmid C (2020) Multi-modal transformer for video retrieval. In: Computer vision–ECCV 2020 16th european conference, pp 214–229
Messina N, Amato G, Esuli A, Falchi F, Gennaro C, Marchand-Maillet S (2021) Fine-grained visual textual alignment for cross-modal retrieval using transformer encoders. ACM Trans. Multimedia Comput, Commun, Appl (TOMM) 17(4):1–23
Article Google Scholar
Prakash A, Chitta K, Geiger A (June 2021) Multi-modal fusion transformer for end-to-end autonomous driving. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 7077–7087
Ju X, Zhang D, Li J, Zhou G (2020) Transformer-based label set generation for multi-modal multi-label emotion detection. In: Proceedings of the 28th ACM international conference on multimedia, pp 512–520
Sun H, Liu J, Chai S, Qiu Z, Lin L, Huang X, Chen Y (2021) Multi-modal adaptive fusion transformer network for the estimation of depression level. Sensors 21(14):4764
Article Google Scholar
Zhou B, Cui Q, Wei X-S, Chen Z-M (2020) Bbn: Bilateral-branch network with cumulative learning for long-tailed visual recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9719–9728
Kang B, Xie S, Rohrbach M, Yan Z, Gordo A, Feng J, Kalantidis Y (2020) Decoupling representation and classifier for long-tailed recognition. In: 8Th International conference on learning representations
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet a large-scale hierarchical image database. In: 2009 IEEE Conference on computer vision and pattern recognition, pp 248–255
Kingma DP, Ba J (2015) Adam: A method for stochastic optimization. In: 3Rd International conference on learning representations

Download references

Acknowledgements

Supported by Zhejiang Lab (No. 2021KE0AB01), Open Fund of Key Laboratory of Monitoring, Evaluation and Early Warning of Territorial Spatial Planning Implementation, Ministry of Natural Resources (No. LMEE-KF2021008), Technology Innovation and Application Development Key Project of Chongqing (No. cstc2021jscx-gksbX0058), National Natural Science Foundation of China (No.62176029), and Guangxi Key Laboratory of Trusted Software (No. kx202006).

Author information

Authors and Affiliations

College of Computer Science, Chongqing University, Chongqing, 400030, China
Bin Wang & Yong Feng
Key Laboratory of Monitoring, Evaluation and Early Warning of Territorial Spatial Planning Implementation, Ministry of Natural Resources, Chongqing, 401147, China
Xian-cai Xiong
Chongqing Institute of Planning and Natural Resources Investigation and Monitoring, Chongqing, 401121, China
Xian-cai Xiong
8# of Zhejiang Lab, Yuhang District, Hangzhou, 311121, China
Yong-heng Wang
Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Guilin, 541004, China
Bao-hua Qiang

Authors

Bin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yong Feng
View author publications
You can also search for this author in PubMed Google Scholar
Xian-cai Xiong
View author publications
You can also search for this author in PubMed Google Scholar
Yong-heng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Bao-hua Qiang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yong Feng.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, B., Feng, Y., Xiong, Xc. et al. Multi-modal transformer using two-level visual features for fake news detection. Appl Intell 53, 10429–10443 (2023). https://doi.org/10.1007/s10489-022-04055-5

Download citation

Accepted: 30 July 2022
Published: 18 August 2022
Issue Date: May 2023
DOI: https://doi.org/10.1007/s10489-022-04055-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-modal transformer using two-level visual features for fake news detection

Abstract

Access this article

Similar content being viewed by others

Multimodal Fake News Detection on Fakeddit Dataset Using Transformer-Based Architectures

Instance-Guided Multi-modal Fake News Detection with Dynamic Intra- and Inter-modality Fusion

$$\mathsf {SAFE}$$ : Similarity-Aware Multi-modal Fake News Detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-modal transformer using two-level visual features for fake news detection

Abstract

Access this article

Similar content being viewed by others

Multimodal Fake News Detection on Fakeddit Dataset Using Transformer-Based Architectures

Instance-Guided Multi-modal Fake News Detection with Dynamic Intra- and Inter-modality Fusion

$$\mathsf {SAFE}$$ : Similarity-Aware Multi-modal Fake News Detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation