Hypergraph-Guided Disentangled Spectrum Transformer Networks for Near-Infrared Facial Expression Recognition

Authors

  • Bingjun Luo Tsinghua University
  • Haowen Wang Tsinghua University
  • Jinpeng Wang Tsinghua University
  • Junjie Zhu Tsinghua University
  • Xibin Zhao Tsinghua University
  • Yue Gao Tsinghua University

DOI:

https://doi.org/10.1609/aaai.v38i9.28874

Keywords:

HAI: Emotional Intelligence, APP: Humanities & Computational Social Science, CMS: Social Cognition And Interaction

Abstract

With the strong robusticity on illumination variations, near-infrared (NIR) can be an effective and essential complement to visible (VIS) facial expression recognition in low lighting or complete darkness conditions. However, facial expression recognition (FER) from NIR images presents a more challenging problem than traditional FER due to the limitations imposed by the data scale and the difficulty of extracting discriminative features from incomplete visible lighting contents. In this paper, we give the first attempt at deep NIR facial expression recognition and propose a novel method called near-infrared facial expression transformer (NFER-Former). Specifically, to make full use of the abundant label information in the field of VIS, we introduce a Self-Attention Orthogonal Decomposition mechanism that disentangles the expression information and spectrum information from the input image, so that the expression features can be extracted without the interference of spectrum variation. We also propose a Hypergraph-Guided Feature Embedding method that models some key facial behaviors and learns the structure of the complex correlations between them, thereby alleviating the interference of inter-class similarity. Additionally, we construct a large NIR-VIS Facial Expression dataset that includes 360 subjects to better validate the efficiency of NFER-Former. Extensive experiments and ablation studies show that NFER-Former significantly improves the performance of NIR FER and achieves state-of-the-art results on the only two available NIR FER datasets, Oulu-CASIA and Large-HFE.

Published

2024-03-24

How to Cite

Luo, B., Wang, H., Wang, J., Zhu, J., Zhao, X., & Gao, Y. (2024). Hypergraph-Guided Disentangled Spectrum Transformer Networks for Near-Infrared Facial Expression Recognition. Proceedings of the AAAI Conference on Artificial Intelligence, 38(9), 10101-10109. https://doi.org/10.1609/aaai.v38i9.28874

Issue

Section

AAAI Technical Track on Humans and AI