CAMEL: Capturing Metaphorical Alignment with Context Disentangling for Multimodal Emotion Recognition

Authors

  • Linhao Zhang Aerospace Information Research Institute, Chinese Academy of Sciences; Key Laboratory of Network Information System Technology (NIST), Aerospace Information Research Institute School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences
  • Li Jin Aerospace Information Research Institute Chinese Academy of Sciences; Key Laboratory of Network Information System Technology (NIST), Aerospace Information Research Institute
  • Guangluan Xu Aerospace Information Research Institute, Chinese Academy of Sciences; Key Laboratory of Network Information System Technology (NIST), Aerospace Information Research Institute
  • Xiaoyu Li Aerospace Information Research Institute, Chinese Academy of Sciences; Key Laboratory of Network Information System Technology (NIST), Aerospace Information Research Institute
  • Cai Xu School of Computer Science and Technology, Xidian University
  • Kaiwen Wei Aerospace Information Research Institute,Chinese Academy of Sciences; Key Laboratory of Network Information System Technology (NIST), Aerospace Information Research Institute School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences
  • Nayu Liu School of Computer Science and Technology, Tiangong University
  • Haonan Liu Aerospace Information Research Institute,Chinese Academy of Sciences; Key Laboratory of Network Information System Technology (NIST), Aerospace Information Research Institute School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences

DOI:

https://doi.org/10.1609/aaai.v38i8.28787

Keywords:

DMKM: Mining of Visual, Multimedia & Multimodal Data, CMS: Affective Computing, ML: Multimodal Learning, NLP: Language Grounding & Multi-modal NLP, NLP: Sentiment Analysis, Stylistic Analysis, and Argument Mining

Abstract

Understanding the emotional polarity of multimodal content with metaphorical characteristics, such as memes, poses a significant challenge in Multimodal Emotion Recognition (MER). Previous MER researches have overlooked the phenomenon of metaphorical alignment in multimedia content, which involves non-literal associations between concepts to convey implicit emotional tones. Metaphor-agnostic MER methods may be misinformed by the isolated unimodal emotions, which are distinct from the real emotions blended in multimodal metaphors. Moreover, contextual semantics can further affect the emotions associated with similar metaphors, leading to the challenge of maintaining contextual compatibility. To address the issue of metaphorical alignment in MER, we propose to leverage a conditional generative approach for capturing metaphorical analogies. Our approach formulates schematic prompts and corresponding references based on theoretical foundations, which allows the model to better grasp metaphorical nuances. In order to maintain contextual sensitivity, we incorporate a disentangled contrastive matching mechanism, which undergoes curricular adjustment to regulate its intensity during the learning process. The automatic and human evaluation experiments on two benchmarks prove that, our model provides considerable and stable improvements in recognizing multimodal emotion with metaphor attributes.

Published

2024-03-24

How to Cite

Zhang, L., Jin, L., Xu, G., Li, X., Xu, C., Wei, K., Liu, N., & Liu, H. (2024). CAMEL: Capturing Metaphorical Alignment with Context Disentangling for Multimodal Emotion Recognition. Proceedings of the AAAI Conference on Artificial Intelligence, 38(8), 9341-9349. https://doi.org/10.1609/aaai.v38i8.28787

Issue

Section

AAAI Technical Track on Data Mining & Knowledge Management