Abstract
Facial expression recognition (FER) in the wild is one of the most challenging visual tasks owing to various uncontrolled factors such as occlusion, pose, and subtle variation in real scenes. These factors can directly affect the robust performance of current networks, especially as most single-feature learning space methods lack the extraction of potential discriminative features and fail to provide a deeper understanding of expressions. To address the above issues, we propose a novel hybrid attention-aware learning network (HALNet), which comprises a feature compactness network (FCN), a hybrid attention enhancement network (HAEN), and a joint loss optimization strategy. First, FCN performs basic expression feature extraction and optimizes intra- and inter-class distributions simultaneously. Afterward, HAEN constructs a multi-level feature enhancement space by fusing hybrid attention based on CNN and transformer in parallel to effectively improve the profound understanding of expressions. Finally, the expression classification is performed by supervised optimization with joint loss. Extensive experiments are assessed on some of the widest employed wild expression datasets, and results indicate our method is superior to several present state-of-the-art methods, obtaining accuracies of 90.29%, 90.04%, and 61.75% on RAF-DB, FERPlus, and AffectNet, respectively. The cross-dataset and occlusion and pose variation datasets assessment further substantiate our approach’s sound generalization and robustness.
Similar content being viewed by others
Data Availability
The datasets used in the paper are all freely available, and the use request is authorized for non-profit purposes.
References
Liu, Z.; Wu, M.; Cao, W.; Chen, L.; Xu, J.; Zhang, R.; Meng, Z.; Jun, M.: A facial ex pression emotion recognition based human-robot interaction system. IEEE CAA J. Autom. Sin. 4(4), 668–676 (2017)
Corneanu, C.A.; Simón, M.O.; Cohn, J.F.; Guerrero, S.E.: Survey on RGB, 3D, thermal, and multimodal approaches for facial expression recognition: History, trends, and affect-related applications. IEEE Trans. Pattern Anal. Mach. Intell. 38(8), 1458–1568 (2016)
Fei, Z.; Erfu, Y.; David, L.; Stephen, B.; Winifred, I.; Xia, L.; Huiyu, Z.: Deep convolution network based emotion analysis towards mental health care. Neurocomputing 388, 212–227 (2020)
Bisogni, C.; Castiglione, A.; Hossain, S.; Narducci, F.; Umer, S.: Impact of deep learning approaches on facial expression recognition in healthcare industries. IEEE Trans. Ind. Inform. 18(8), 5619–5627 (2022)
Lucey, P.; Cohn, J.F.; Kanade, T.; Saragih, J.; Ambadar, Z.; Matthews, I.: The extended cohn-kanade dataset (ck+): a complete dataset for action unit and emotion-specified expression. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 94–101 (2010)
Zhao, G.; Huang, X.; Taini, M.; Li, S.Z.; PietikäInen, M.: Facial expression recognition from near-infrared videos. Image Vis. Comput. 29(9), 607–619 (2011)
Pantic, M.; Valstar, M.; Rademaker, R.; Maat, L.: Web-based database for facial expression analysis. In: Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), pp. 5 (2005)
Li, S.; Deng, W.; Du, J.P.: Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2852–2861 (2017)
Barsoum, E.; Zhang, C.; Ferrer, C.C.; Zhang, Z.: Training deep networks for facial expression recognition with crowd-sourced label distribution. In: Proceedings of the ACM International Conference on Multimodal Interaction (ICMI), pp. 279–283 (2016)
Mollahosseini, A.; Hasani, B.; Mahoor, M.H.: AffectNet: A database for facial ex pression, valence, and arousal computing in the wild. IEEE Trans. Affect. Comput. 10(1), 18–31 (2017)
Zhao, G.; Pietikainen, M.: Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 915–928 (2007)
Aamir, M.; Ali, T.; Shaf, A.; Irfan, M.; Saleem, M.Q.: ML-DCNNet: multi-level deep convolutional neural network for facial expression recognition and intensity estimation. Arab. J. Sci. Eng. 45(12), 10605–10620 (2020)
Happy, S.L.; Routray, A.: Automatic facial expression recognition using features of salient facial patches. IEEE Trans. Affect. Comput. 6(1), 1–12 (2014)
Yan, Y.; Zhang, Z.; Chen, S.; Wang, H.: Low-resolution facial expression recognition: A filter learning perspective. Signal Process. 169, 107370 (2020)
Zhang, Z.; Luo, P.; Loy, C.C.; Tang, X.: From facial expression recognition to inter personal relation prediction. Int. J. Comput. Vis. 126, 550–569 (2018)
Sepas-Moghaddam, A.; Etemad, A.; Pereira, F.; Correia, P.L.: Capsfield: Light field-based face and expression recognition in the wild using capsule routing. IEEE Trans. Image Process. 30, 2627–2642 (2021)
Arnaud, E.; Dapogny, A.; Bailly, K.: Thin: Throwable information networks and application for facial expression recognition in the wild. IEEE Trans. Affect. Comput. (2022)
Fan, Q.; Zhuo, W.; Tang, C. K.; Tai, Y. W.: Few-shot object detection with attention-RPN and multi-relation detector. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4013–4022 (2020)
Valanarasu, J. M. J.; Oza, P.; Hacihaliloglu, I.; Patel, V. M.: Medical transformer: Gated axial-attention for medical image segmentation. In: Proceedings of the Medical Image Computing and Computer Assisted Intervention (MICCAI), pp. 36–46 (2021)
Liu, Z.; Wen, C.; Su, Z.; Liu, S.; Sun, J.; Kong, W.; Yang, Z: Emotion-semantic-aware dual contrastive learning for epistemic emotion identification of learner-generated reviews in MOOCs. IEEE Trans. Neural Netw. Learn. Syst. (2023).
Liu, Y.; Li, G.; Lin, L.: Cross-modal causal relational reasoning for event-level visual question answering. IEEE Trans. Pattern Anal. Mach. Intell. 45(10), 11624–11641 (2023)
Wang, K.; Peng, X.; Yang, J.; Meng, D.; Qiao, Y.: Region attention networks for pose and occlusion robust facial expression recognition. IEEE Trans. Image Process. 29, 4057–4069 (2020)
Cai, J.; Meng, Z.; Khan, A.S.; Li, Z.; O'Reilly, J.; Tong, Y.: Probabilistic attribute tree structured convolutional neural networks for facial expression recognition in the wild. IEEE Trans. Affect. Comput. (2022)
Wen, Y.; Zhang, K.; Li, Z.; Qiao, Y.: A discriminative feature learning approach for deep face recognition. In: Proceedings of the European Conference on Computer Vision (ECCV), vol. 14, pp. 499–515 (2016)
Cai, J.; Meng, Z.; Khan, A.S.; Li, Z.; O'Reilly, J.; Tong, Y.: Island loss for learning discriminative features in facial expression recognition. In: Proceedings of the IEEE International Conference on Automatic Face & Gesture Recognition (FG), pp. 302–309 (2018)
Wang, K.; Peng, X.; Yang, J.; Lu, S.; Qiao, Y.: Suppressing uncertainties for large-scale facial expression recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6897–6906 (2020)
Zhang, Y.; Wang, C.; Deng, W.: Relative uncertainty learning for facial expression recognition. In: Proceedings of Advanced Neural Information Processing Systems, vol. 34, pp. 17616–17627 (2021)
Yan, H.; Gu, Y.; Zhang, X.; Wang, Y.; Ji, Y.; Ren, F.: Mitigating label-noise for facial expression recognition in the wild. In: 2022 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6 (2022)
Li, Y.; Lu, Y.; Li, J.; Lu, G.: Separate loss for basic and compound facial expression recognition in the wild. In: Proceedings of the Asian Conference on Machine Learning (ACML), pp. 897–911 (2019)
Fan, X.; Deng, Z.; Wang, K.; Peng, X.; Qiao, Y.: Learning discriminative representation for facial expression recognition from uncertainties. In: Proceedings of the IEEE International Conference on Image Processing (ICIP), pp. 903–907 (2020)
Farzaneh, A.H.; Qi, X.: Discriminant distribution-agnostic loss for facial expression recognition in the wild. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 406–407 (2020)
Siqueira, H.; Magg, S.; Wermter, S.: Efficient facial feature learning with wide ensemble-based convolutional neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 5800–5809 (2020)
Liu, P.; Lin, Y.; Meng, Z.; Lu, L.; Deng, W.; Zhou, J.T.; Yang, Y.: Point adversarial self-mining: a simple method for facial expression recognition. IEEE T. Cybern. 1–12 (2021)
Ruan, D.; Yan, Y.; Lai, S.; Chai, Z.; Shen, C.; Wang, H.: Feature decomposition and reconstruction learning for effective facial expression recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7660–7669 (2021)
Zhao, Z.; Liu, Q.; Zhou, F.: Robust lightweight facial expression recognition network with label distribution training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 4, pp. 3510–3519 (2021)
Jiang, J.; Deng, W.: Disentangling identity and pose for facial expression recognition. IEEE Trans. Affect. Comput. 13(4), 1868–1878 (2022)
Li, Y.; Lu, G.; Li, J.; Zhang, Z.; Zhang, D.: Facial expression recognition in the wild using multi-level features and attention mechanisms. IEEE Trans. Affect. Comput. (2020)
Xia, H.Y.; Li, C.; Tan, Y.; Li, L.; Song, S.: Destruction and reconstruction learning for facial expression recognition. IEEE Multimedia 28(2), 20–28 (2021)
Zhao, Z.; Liu, Q.; Wang, S.: Learning deep global multi-scale and local attention features for facial expression recognition in the wild. IEEE Trans. Image Process. 30, 6544–6556 (2021)
Guo, Y.; Huang, J.; Xiong, M.; Wang, Z.; Hu, X.; Wang, J.; Hijji, M.: Facial expressions recognition with multi-region divided attention networks for smart education cloud applications. Neurocomputing 493, 119–128 (2022)
Liu, H.; Cai, H.; Lin, Q.; Li, X.; Xiao, H.: Adaptive multilayer perceptual attention network for facial expression recognition. IEEE Trans. Circuits Syst. Video Technol. 32(9), 6253–6266 (2022)
Wang, C.; Xue, J.; Lu, K.; Yan, Y.: Light attention embedding for facial expression recognition. IEEE Trans. Circuits Syst. Video Technol. 32(4), 1834–1847 (2021)
Ruan, D.; Mo, R.; Yan, Y.; Chen, S.; Xue, J.H.; Wang, H.: Adaptive deep disturbance-disentangled learning for facial expression recognition. Int. J. Comput. Vision 130(2), 455–477 (2022)
Zhang, Z.; Tian, X.; Zhang, Y.; Guo, K.; Xu, X.: Enhanced discriminative global-local feature learning with priority for facial expression recognition. Inf. Sci. 630, 370–384 (2023)
Ma, F.; Sun, B.; Li, S.: Facial expression recognition with visual transformers and attentional selective fusion. IEEE Trans. Affect. Comput. (2021)
Liang, X.; Xu, L.; Zhang, W.; Zhang, Y.; Liu, J.; Liu, Z.: A convolution-transformer dual branch network for head-pose and occlusion facial expression recognition. Vis. Comput. 1–14 (2022)
Sun, N.; Song, Y.; Liu, J.; Chai, L.; Sun, H.: Appearance and geometry transformer for facial expression recognition in the wild. Comput. Electr. Eng. 107, 108583 (2023)
Wen, Y.; Zhang, K.; Li, Z.; Qiao, Y.: A discriminative feature learning approach for deep face recognition. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 499–515 (2016)
Farzaneh, A.H.; Qi, X.: Facial expression recognition in the wild via deep attentive center loss. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2402–2411 (2021)
Dosovitskiy, A.; et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Proceedings of the International Conference on Learning Representations (ICLR), pp. 1–22 (2020)
Jeon, Y.; Kim, J.: Constructing fast network through deconstruction of convolution. In: Proceedings of Advanced Neural Information Processing Systems, vol. 31 (2018)
Wang, G.; Zhao, Y.; Tang, C.; Luo, C.; Zeng, W.: When shift operation meets vision transformer: An extremely simple alternative to attention mechanism. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 2, pp. 2423–2430 (2022)
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, J.; Gomez, A.N.; Kaiser, L.; Polosukhin, I.: Attention is all you need. In: Proceedings of the Advances in Neural Information Processing Systems, vol. 30 (2017)
Goodfellow, I.J.; Erhan, D.; Carrier, P.L.; Courville, A.; Mirza, M.; Hamner, B.; Bengio, Y.: Challenges in representation learning: A report on three machine learning contests. In: Proceedings of the International Conference on Neural Information Processing, pp. 117–124 (2013)
Guo, Y.; Zhang, L.; Hu, Y.; He, X.; Gao, J.: Ms-celeb-1m: a dataset and benchmark for large-scale face recognition. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 87–102 (2016)
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D.: Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 618–626 (2017)
Van der Maaten, L.; Hinton, G.: Visualizing data using t-SNE. J. mach. Learn. Res. 9(11), 2579–2605 (2008)
Georgescu, M.I.; Ionescu, R.T.; Popescu, M.: Local learning with deep and hand-crafted features for facial expression recognition. IEEE Access 7, 64827–64836 (2019)
Liu, C.; Hirota, K.; Dai, Y.: Patch attention convolutional vision transformer for facial expression recognition with occlusion. Inf. Sci. 619, 781–794 (2023)
Li, Y.; Zeng, J.; Shan, S.; Chen, X.: Occlusion aware facial expression recognition using CNN with attention mechanism. IEEE Trans. Image Process. 28(5), 2439–2450 (2018)
Acknowledgements
This work was supported in part by the National Science Foundation of China under Grant 61966035, 62266043 and U1803261, in part by National Science and Technology Major Project under Grant 95-Y50G37-9001-22/23, and in part by Basic Research Foundation of Universities in the Xinjiang Uygur Autonomous Region of China under Grant 2021D01C083.
Author information
Authors and Affiliations
Contributions
All authors were involved in the conceptualization and design of the study. GW and LZ performed material preparation, data collection, and analysis. GW and ZW wrote the first draft of the manuscript, and all authors commented on previous versions of the manuscript. QY read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Gong, W., La, Z., Qian, Y. et al. Hybrid Attention-Aware Learning Network for Facial Expression Recognition in the Wild. Arab J Sci Eng (2024). https://doi.org/10.1007/s13369-023-08538-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13369-023-08538-6