CAMEL: Capturing Metaphorical Alignment with Context Disentangling for Multimodal Emotion Recognition

Linhao Zhang; Li Jin; Guangluan Xu; Xiaoyu Li; Cai Xu; Kaiwen Wei; Nayu Liu; Haonan Liu

doi:10.1609/aaai.v38i8.28787

Authors

Linhao Zhang Aerospace Information Research Institute, Chinese Academy of Sciences; Key Laboratory of Network Information System Technology (NIST), Aerospace Information Research Institute School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences
Li Jin Aerospace Information Research Institute Chinese Academy of Sciences; Key Laboratory of Network Information System Technology (NIST), Aerospace Information Research Institute
Guangluan Xu Aerospace Information Research Institute, Chinese Academy of Sciences; Key Laboratory of Network Information System Technology (NIST), Aerospace Information Research Institute
Xiaoyu Li Aerospace Information Research Institute, Chinese Academy of Sciences; Key Laboratory of Network Information System Technology (NIST), Aerospace Information Research Institute
Cai Xu School of Computer Science and Technology, Xidian University
Kaiwen Wei Aerospace Information Research Institute,Chinese Academy of Sciences; Key Laboratory of Network Information System Technology (NIST), Aerospace Information Research Institute School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences
Nayu Liu School of Computer Science and Technology, Tiangong University
Haonan Liu Aerospace Information Research Institute,Chinese Academy of Sciences; Key Laboratory of Network Information System Technology (NIST), Aerospace Information Research Institute School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences

DOI:

https://doi.org/10.1609/aaai.v38i8.28787

Keywords:

DMKM: Mining of Visual, Multimedia & Multimodal Data, CMS: Affective Computing, ML: Multimodal Learning, NLP: Language Grounding & Multi-modal NLP, NLP: Sentiment Analysis, Stylistic Analysis, and Argument Mining

Abstract

Understanding the emotional polarity of multimodal content with metaphorical characteristics, such as memes, poses a significant challenge in Multimodal Emotion Recognition (MER). Previous MER researches have overlooked the phenomenon of metaphorical alignment in multimedia content, which involves non-literal associations between concepts to convey implicit emotional tones. Metaphor-agnostic MER methods may be misinformed by the isolated unimodal emotions, which are distinct from the real emotions blended in multimodal metaphors. Moreover, contextual semantics can further affect the emotions associated with similar metaphors, leading to the challenge of maintaining contextual compatibility. To address the issue of metaphorical alignment in MER, we propose to leverage a conditional generative approach for capturing metaphorical analogies. Our approach formulates schematic prompts and corresponding references based on theoretical foundations, which allows the model to better grasp metaphorical nuances. In order to maintain contextual sensitivity, we incorporate a disentangled contrastive matching mechanism, which undergoes curricular adjustment to regulate its intensity during the learning process. The automatic and human evaluation experiments on two benchmarks prove that, our model provides considerable and stable improvements in recognizing multimodal emotion with metaphor attributes.

CAMEL: Capturing Metaphorical Alignment with Context Disentangling for Multimodal Emotion Recognition

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription