research-article

ECENet: Explainable and Context-Enhanced Network for Muti-modal Fact verification

Authors:
Fanrui Zhang

University of Science and Technology of China, Hefei, China

University of Science and Technology of China, Hefei, China

0000-0002-1078-430X
View Profile

,
Jiawei Liu

University of Science and Technology of China, Hefei, China

University of Science and Technology of China, Hefei, China

0000-0001-9940-6366
View Profile

,
Qiang Zhang

University of Science and Technology of China, Hefei, China

University of Science and Technology of China, Hefei, China

0000-0002-7809-5818
View Profile

,
Esther Sun

University of Toronto, Toronto, Canada

University of Toronto, Toronto, Canada

0009-0009-0334-2640
View Profile

,
Jingyi Xie

University of Science and Technology of China, Hefei, China

University of Science and Technology of China, Hefei, China

0000-0002-9673-7001
View Profile

,
Zheng-Jun Zha

University of Science and Technology of China, Hefei, China

University of Science and Technology of China, Hefei, China

0000-0003-2510-8993
View Profile

MM '23: Proceedings of the 31st ACM International Conference on MultimediaOctober 2023Pages 1231–1240https://doi.org/10.1145/3581783.3612183

Published:27 October 2023Publication History

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Pages 1231–1240

ABSTRACT

Recently, falsified claims incorporating both text and images have been disseminated more effectively than those containing text alone, raising significant concerns for multi-modal fact verification. Existing research makes contributions to multi-modal feature extraction and interaction, but fails to fully utilize and enhance the valuable and intricate semantic relationships between distinct features. Moreover, most detectors merely provide a single outcome judgment and lack an inference process or explanation. Taking these factors into account, we propose a novel Explainable and Context-Enhanced Network (ECENet) for multi-modal fact verification, making the first attempt to integrate multi-clue feature extraction, multi-level feature reasoning, and justification (explanation) generation within a unified framework. Specifically, we propose an Improved Coarse- and Fine-grained Attention Network, equipped with two types of level-grained attention mechanisms, to facilitate a comprehensive understanding of contextual information. Furthermore, we propose a novel justification generation module via deep reinforcement learning that does not require additional labels. In this module, a sentence extractor agent measures the importance between the query claim and all document sentences at each time step, selecting a suitable amount of high-scoring sentences to be rewritten as the explanation of the model. Extensive experiments demonstrate the effectiveness of the proposed method.

References

Sahar Abdelnabi, Rakibul Hasan, and Mario Fritz. 2022. Open-Domain, Content-based, Multi-modal Fact-checking of Out-of-Context Images via Online Resources. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14940--14949.Google ScholarCross Ref
Shruti Agarwal, Hany Farid, Yuming Gu, Mingming He, Koki Nagano, and Hao Li. 2019. Protecting World Leaders Against Deep Fakes.. In Proceedings of the IEEE Conference Workshops on Computer Vision and Pattern Recognition. 38--48.Google Scholar
Tariq Alhindi, Savvas Petridis, and Smaranda Muresan. 2018. Where is your evidence: Improving fact-checking by justification modeling. In Proceedings of the First Workshop on Fact Extraction and Verification (FEVER). 85--90.Google ScholarCross Ref
Pepa Atanasova, Jakob Grue Simonsen, Christina Lioma, and Isabelle Augenstein. 2020. Generating Fact Checking Explanations. arxiv: 2004.05773 [cs.CL]Google Scholar
Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of data. 1247--1250.Google ScholarDigital Library
Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. Proceedings of the Advances in Neural Information Processing Systems, Vol. 26 (2013).Google Scholar
Brooke Borel. 2016. The Chicago Guide to Fact-checking. University of Chicago Press.Google Scholar
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).Google Scholar
Abhishek Dhankar, Osmar R. Zaïane, and Francois Bolduc. 2022. UofA-Truth at Factify 2022: Transformer And Transfer Learning Based Multi-Modal Fact-Checking. arxiv: 2203.07990 [cs.MM]Google Scholar
Wenkai Dong, Zhaoxiang Zhang, and Tieniu Tan. 2019. Attention-aware sampling via deep reinforcement learning for action recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 8247--8254.Google ScholarDigital Library
Jie Gao, Hella-Franziska Hoffmann, Stylianos Oikonomou, David Kiskovski, and Anil Bandhakavi. 2021. Logically at Factify 2022: Multimodal Fact Verfication. ArXiv, Vol. abs/2112.09253 (2021).Google Scholar
Zhijiang Guo, Michael Schlichtkrull, and Andreas Vlachos. 2022. A survey on automated fact-checking. Transactions of the Association for Computational Linguistics, Vol. 10 (2022), 178--206.Google ScholarCross Ref
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 770--778.Google ScholarCross Ref
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2017. Imagenet classification with deep convolutional neural networks. Commun. ACM, Vol. 60, 6 (2017), 84--90.Google ScholarDigital Library
Dong Li, Jiaying Zhu, Menglu Wang, Jiawei Liu, Xueyang Fu, and Zheng-Jun Zha. 2023 b. Edge-Aware Regional Message Passing Controller for Image Forgery Localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 8222--8232.Google ScholarCross Ref
Yi Li, Hualiang Wang, Yiqun Duan, and Xiaomeng Li. 2023 a. CLIP Surgery for Better Explainability with Enhancement in Open-Vocabulary Tasks. arXiv preprint arXiv:2304.05653 (2023).Google Scholar
Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. 2017. Feature pyramid networks for object detection. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2117--2125.Google ScholarCross Ref
Fuxiao Liu, Yinghan Wang, Tianlu Wang, and Vicente Ordonez. 2020a. VisualNews: Benchmark and Challenges in Entity-aware Image Captioning. arxiv: 2010.03743 [cs.CV]Google Scholar
Zhenghao Liu, Chenyan Xiong, Maosong Sun, and Zhiyuan Liu. 2019. Fine-grained fact verification with kernel graph attention network. arXiv preprint arXiv:1910.09796 (2019).Google Scholar
Zhenghao Liu, Chenyan Xiong, Maosong Sun, and Zhiyuan Liu. 2020b. Fine-grained Fact Verification with Kernel Graph Attention Network. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 7342--7351. https://doi.org/10.18653/v1/2020.acl-main.655Google ScholarCross Ref
Yi-Ju Lu and Cheng-Te Li. 2020a. GCAN: Graph-aware co-attention networks for explainable fake news detection on social media. arXiv preprint arXiv:2004.11648 (2020).Google Scholar
Yi-Ju Lu and Cheng-Te Li. 2020b. GCAN: Graph-aware Co-Attention Networks for Explainable Fake News Detection on Social Media. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 505--514. https://doi.org/10.18653/v1/2020.acl-main.48Google ScholarCross Ref
Jackson Luken, Nanjiang Jiang, and Marie-Catherine de Marneffe. 2018. QED: A fact verification system for the FEVER shared task. In Proceedings of the First Workshop on Fact Extraction and VERification (FEVER). Association for Computational Linguistics, Brussels, Belgium, 156--160. https://doi.org/10.18653/v1/W18--5526Google ScholarCross Ref
Grace Luo, Trevor Darrell, and Anna Rohrbach. 2021. Newsclippings: Automatic generation of out-of-context multimodal media. arXiv preprint arXiv:2104.05893 (2021).Google Scholar
Jing Ma, Wei Gao, Shafiq Joty, and Kam-Fai Wong. 2019. Sentence-Level Evidence Embedding for Claim Verification with Hierarchical Attention Networks. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 2561--2571. https://doi.org/10.18653/v1/P19-1244Google ScholarCross Ref
Shreyash Mishra, S Suryavardan, Amrit Bhaskar, Parul Chopra, Aishwarya Reganti, Parth Patwa, Amitava Das, Tanmoy Chakraborty, Amit Sheth, Asif Ekbal, et al. 2022. Factify: A multi-modal fact verification dataset. In Proceedings of the First Workshop on Multimodal Fact-Checking and Hate Speech Detection (DE-FACTIFY).Google Scholar
Kartik Narayan, Harsh Agarwal, Surbhi Mittal, Kartik Thakral, Suman Kundu, Mayank Vatsa, and Richa Singh. 2022. DeSI: Deepfake Source Identifier for Social Media. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2858--2867.Google ScholarCross Ref
Shashi Narayan, Shay B Cohen, and Mirella Lapata. 2018. Ranking sentences for extractive summarization with reinforcement learning. arXiv preprint arXiv:1802.08636 (2018).Google Scholar
Yixin Nie, Haonan Chen, and Mohit Bansal. 2019. Combining Fact Extraction and Verification with Neural Semantic Matching Networks. In The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 2019. AAAI Press, 6859--6866. https://doi.org/10.1609/aaai.v33i01.33016859Google ScholarDigital Library
Parth Patwa, Shreyash Mishra, S Suryavardan, Amrit Bhaskar, Parul Chopra, Aishwarya N. Reganti, Amitava Das, Tanmoy Chakraborty, A. Sheth, Asif Ekbal, and Chaitanya Ahuja. 2022. Benchmarking Multi-Modal Entailment for Fact Verification (short paper). In DE-FACTIFY@AAAI.Google Scholar
Kashyap Popat, Subhabrata Mukherjee, Andrew Yates, and Gerhard Weikum. 2018a. Declare: Debunking fake news and false claims using evidence-aware deep learning. arXiv preprint arXiv:1809.06416 (2018).Google Scholar
Kashyap Popat, Subhabrata Mukherjee, Andrew Yates, and Gerhard Weikum. 2018b. DeClarE: Debunking Fake News and False Claims using Evidence-Aware Deep Learning. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, 22--32. https://doi.org/10.18653/v1/D18-1003Google ScholarCross Ref
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning. PMLR, 8748--8763.Google Scholar
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, Vol. 21, 1 (2020), 5485--5551.Google ScholarDigital Library
Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. ArXiv, Vol. abs/1910.01108 (2019).Google Scholar
Zhihua Shang, Hongtao Xie, Zhengjun Zha, Lingyun Yu, Yan Li, and Yongdong Zhang. 2021. PRRNet: Pixel-Region relation network for face forgery detection. Pattern Recognition, Vol. 116 (2021), 107950.Google ScholarDigital Library
Kai Shu, Limeng Cui, Suhang Wang, Dongwon Lee, and Huan Liu. 2019. dEFEND: Explainable Fake News Detection. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2019, Anchorage, AK, USA, August 4-8, 2019, Ankur Teredesai, Vipin Kumar, Ying Li, Rómer Rosales, Evimaria Terzi, and George Karypis (Eds.). ACM, 395--405. https://doi.org/10.1145/3292500.3330935Google ScholarDigital Library
James Thorne, Andreas Vlachos, Christos Christodoulopoulos, and Arpit Mittal. 2018. Fever: a large-scale dataset for fact extraction and verification. arXiv preprint arXiv:1803.05355 (2018).Google Scholar
Joseph E. Uscinski and Ryden W. Butler. 2013. The Epistemology of Fact Checking. Critical Review, Vol. 25, 2 (2013), 162--180.Google ScholarCross Ref
Menglu Wang, Xueyang Fu, Jiawei Liu, and Zheng-Jun Zha. 2022. Jpeg compress-ion-aware image forgery localization. In Proceedings of the 30th ACM International Conference on Multimedia. 5871--5879.Google Scholar
Wei-Yao Wang and Wen-Chih Peng. 2022. Team Yao at Factify 2022: Utilizing Pre-trained Models and Co-attention Networks for Multi-Modal Fact Verification (short paper). ArXiv, Vol. abs/2201.11664 (2022).Google Scholar
Jie Wu, Guanbin Li, Si Liu, and Liang Lin. 2020a. Tree-structured policy based progressive reinforcement learning for temporally language grounding in video. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 12386--12393.Google ScholarCross Ref
Lianwei Wu, Yuan Rao, Yongqiang Zhao, Hao Liang, and Ambreen Nazir. 2020b. DTCA: Decision Tree-based Co-Attention Networks for Explainable Claim Verification. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 1024--1035. https://doi.org/10.18653/v1/2020.acl-main.97Google ScholarCross Ref
Fan Yang, Shiva K. Pentyala, Sina Mohseni, Mengnan Du, Hao Yuan, Rhema Linder, Eric D. Ragan, Shuiwang Ji, and Xia (Ben) Hu. 2019. XFake: Explainable Fake News Detector with Visualizations. In The World Wide Web Conference, WWW 2019, San Francisco, CA, USA, May 13-17, 2019, Ling Liu, Ryen W. White, Amin Mantrach, Fabrizio Silvestri, Julian J. McAuley, Ricardo Baeza-Yates, and Leila Zia (Eds.). ACM, 3600--3604. https://doi.org/10.1145/3308558.3314119Google ScholarDigital Library
Barry Menglong Yao, Aditya Shah, Lichao Sun, Jin-Hee Cho, and Lifu Huang. 2022. End-to-end multimodal fact-checking and explanation generation: A challenging dataset and models. arXiv preprint arXiv:2205.12487 (2022).Google Scholar
Takuma Yoneda, Jeff Mitchell, Johannes Welbl, Pontus Stenetorp, and Sebastian Riedel. 2018. UCL Machine Reading Group: Four Factor Framework For Fact Finding (HexaF). In Proceedings of the First Workshop on Fact Extraction and VERification (FEVER). Association for Computational Linguistics, Brussels, Belgium, 97--102. https://doi.org/10.18653/v1/W18-5515Google ScholarCross Ref
Wanjun Zhong, Jingjing Xu, Duyu Tang, Zenan Xu, Nan Duan, Ming Zhou, Jiahai Wang, and Jian Yin. 2020. Reasoning Over Semantic-Level Graph for Fact Checking. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 6170--6180. https://doi.org/10.18653/v1/2020.acl-main.549Google ScholarCross Ref
Jie Zhou, Xu Han, Cheng Yang, Zhiyuan Liu, Lifeng Wang, Changcheng Li, and Maosong Sun. 2019. GEAR: Graph-based Evidence Aggregating and Reasoning for Fact Verification. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 892--901. https://doi.org/10.18653/v1/P19-1085Google ScholarCross Ref
Peng Zhou, Wei Shi, Jun Tian, Zhenyu Qi, Bingchen Li, Hongwei Hao, and Bo Xu. 2016. Attention-based bidirectional long short-term memory networks for relation classification. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (volume 2: Short papers). 207--212.Google ScholarCross Ref
Yipin Zhou and Ser-Nam Lim. 2021. Joint audio-visual deepfake detection. In Proceedings of the IEEE International Conference on Computer Vision. 14800--14809.Google ScholarCross Ref
Yan Zhuang and Yanru Zhang. 2022. Yet at Factify 2022: Unimodal and Bimodal RoBERTa-based models for Fact Checking (short paper). In DE-FACTIFY@AAAI.Google Scholar

Index Terms

ECENet: Explainable and Context-Enhanced Network for Muti-modal Fact verification
1. Information systems
  1. Information systems applications
    1. Multimedia information systems
  2. World Wide Web
    1. Web applications
      1. Social networks

Recommendations

Fine-grained attention-based phrase-aware network for aspect-level sentiment analysis
Abstract
Aspect-level sentiment classification aims to identify the sentiment polarity of a specific aspect in a sentence. In recent years, many researchers have sought to explore aspect-specific representation via attention mechanisms. Although a ...
Read More
Topic Attentional Neural Network for Abstractive Document Summarization
Advances in Knowledge Discovery and Data Mining
Abstract
Abstractive summarization is a renewed and challenging task of document summarization. Recently, neural networks, especially attentional encoder-docoder architecture, have achieved impressive progress in abstractive document summarization. However,...
Read More
Explainable Argumentation for Wellness Consultation
Explainable, Transparent Autonomous Agents and Multi-Agent Systems
Abstract
There has been a recent resurgence in the area of explainable artificial intelligence as researchers and practitioners seek to provide more transparency to their algorithms. Much of this research is focused on explicitly explaining decisions or ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '23: Proceedings of the 31st ACM International Conference on Multimedia
October 2023
9913 pages
ISBN:9798400701085
DOI:10.1145/3581783
General Chairs:
Abdulmotaleb El Saddik
University of Ottawa, Canada & MBZUAI, UAE
,
Tao Mei
HiDream.ai, China
,
Rita Cucchiara
University of Modena and Reggio Emilia, Italy
,
Program Chairs:
Marco Bertini
University of Florence, Italy
,
Diana Patricia Tobon Vallejo
Unversidad de Medellin, Colombia
,
Pradeep K. Atrey
University at Albany, State University of New York, USA
,
M. Shamim Hossain
M. Shamim Hossain (King Saud University, KSA
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 October 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
attention mechanism
deep reinforcement learning
interpretability
muti-modal fact verification
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate995of4,171submissions,24%
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 326
  Total Downloads
- Downloads (Last 12 months)326
- Downloads (Last 6 weeks)50
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

ECENet: Explainable and Context-Enhanced Network for Muti-modal Fact verification

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

ABSTRACT

References

Cited By

Index Terms

Recommendations

Fine-grained attention-based phrase-aware network for aspect-level sentiment analysis

Topic Attentional Neural Network for Abstractive Document Summarization

Explainable Argumentation for Wellness Consultation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media