Hybrid Attention-Aware Learning Network for Facial Expression Recognition in the Wild

Gong, Weijun; La, Zhiyao; Qian, Yurong; Zhou, Weihang

doi:10.1007/s13369-023-08538-6

Hybrid Attention-Aware Learning Network for Facial Expression Recognition in the Wild

Research Article-Computer Engineering and Computer Science
Published: 05 January 2024

(2024)
Cite this article

Arabian Journal for Science and Engineering Aims and scope Submit manuscript

Weijun Gong¹,
Zhiyao La²,
Yurong Qian^1,2,3 &
…
Weihang Zhou²

253 Accesses
Explore all metrics

Abstract

Facial expression recognition (FER) in the wild is one of the most challenging visual tasks owing to various uncontrolled factors such as occlusion, pose, and subtle variation in real scenes. These factors can directly affect the robust performance of current networks, especially as most single-feature learning space methods lack the extraction of potential discriminative features and fail to provide a deeper understanding of expressions. To address the above issues, we propose a novel hybrid attention-aware learning network (HALNet), which comprises a feature compactness network (FCN), a hybrid attention enhancement network (HAEN), and a joint loss optimization strategy. First, FCN performs basic expression feature extraction and optimizes intra- and inter-class distributions simultaneously. Afterward, HAEN constructs a multi-level feature enhancement space by fusing hybrid attention based on CNN and transformer in parallel to effectively improve the profound understanding of expressions. Finally, the expression classification is performed by supervised optimization with joint loss. Extensive experiments are assessed on some of the widest employed wild expression datasets, and results indicate our method is superior to several present state-of-the-art methods, obtaining accuracies of 90.29%, 90.04%, and 61.75% on RAF-DB, FERPlus, and AffectNet, respectively. The cross-dataset and occlusion and pose variation datasets assessment further substantiate our approach’s sound generalization and robustness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CBAM: Convolutional Block Attention Module

Facial emotion recognition using convolutional neural networks (FERC)

Article 18 February 2020

Convolutional neural network: a review of models, methodologies and applications to object detection

Article 20 December 2019

Data Availability

The datasets used in the paper are all freely available, and the use request is authorized for non-profit purposes.

References

Liu, Z.; Wu, M.; Cao, W.; Chen, L.; Xu, J.; Zhang, R.; Meng, Z.; Jun, M.: A facial ex pression emotion recognition based human-robot interaction system. IEEE CAA J. Autom. Sin. 4(4), 668–676 (2017)
Article Google Scholar
Corneanu, C.A.; Simón, M.O.; Cohn, J.F.; Guerrero, S.E.: Survey on RGB, 3D, thermal, and multimodal approaches for facial expression recognition: History, trends, and affect-related applications. IEEE Trans. Pattern Anal. Mach. Intell. 38(8), 1458–1568 (2016)
Article Google Scholar
Fei, Z.; Erfu, Y.; David, L.; Stephen, B.; Winifred, I.; Xia, L.; Huiyu, Z.: Deep convolution network based emotion analysis towards mental health care. Neurocomputing 388, 212–227 (2020)
Article Google Scholar
Bisogni, C.; Castiglione, A.; Hossain, S.; Narducci, F.; Umer, S.: Impact of deep learning approaches on facial expression recognition in healthcare industries. IEEE Trans. Ind. Inform. 18(8), 5619–5627 (2022)
Article Google Scholar
Lucey, P.; Cohn, J.F.; Kanade, T.; Saragih, J.; Ambadar, Z.; Matthews, I.: The extended cohn-kanade dataset (ck+): a complete dataset for action unit and emotion-specified expression. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 94–101 (2010)
Zhao, G.; Huang, X.; Taini, M.; Li, S.Z.; PietikäInen, M.: Facial expression recognition from near-infrared videos. Image Vis. Comput. 29(9), 607–619 (2011)
Article Google Scholar
Pantic, M.; Valstar, M.; Rademaker, R.; Maat, L.: Web-based database for facial expression analysis. In: Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), pp. 5 (2005)
Li, S.; Deng, W.; Du, J.P.: Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2852–2861 (2017)
Barsoum, E.; Zhang, C.; Ferrer, C.C.; Zhang, Z.: Training deep networks for facial expression recognition with crowd-sourced label distribution. In: Proceedings of the ACM International Conference on Multimodal Interaction (ICMI), pp. 279–283 (2016)
Mollahosseini, A.; Hasani, B.; Mahoor, M.H.: AffectNet: A database for facial ex pression, valence, and arousal computing in the wild. IEEE Trans. Affect. Comput. 10(1), 18–31 (2017)
Article Google Scholar
Zhao, G.; Pietikainen, M.: Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 915–928 (2007)
Article Google Scholar
Aamir, M.; Ali, T.; Shaf, A.; Irfan, M.; Saleem, M.Q.: ML-DCNNet: multi-level deep convolutional neural network for facial expression recognition and intensity estimation. Arab. J. Sci. Eng. 45(12), 10605–10620 (2020)
Article Google Scholar
Happy, S.L.; Routray, A.: Automatic facial expression recognition using features of salient facial patches. IEEE Trans. Affect. Comput. 6(1), 1–12 (2014)
Article Google Scholar
Yan, Y.; Zhang, Z.; Chen, S.; Wang, H.: Low-resolution facial expression recognition: A filter learning perspective. Signal Process. 169, 107370 (2020)
Article Google Scholar
Zhang, Z.; Luo, P.; Loy, C.C.; Tang, X.: From facial expression recognition to inter personal relation prediction. Int. J. Comput. Vis. 126, 550–569 (2018)
Article MathSciNet Google Scholar
Sepas-Moghaddam, A.; Etemad, A.; Pereira, F.; Correia, P.L.: Capsfield: Light field-based face and expression recognition in the wild using capsule routing. IEEE Trans. Image Process. 30, 2627–2642 (2021)
Article Google Scholar
Arnaud, E.; Dapogny, A.; Bailly, K.: Thin: Throwable information networks and application for facial expression recognition in the wild. IEEE Trans. Affect. Comput. (2022)
Fan, Q.; Zhuo, W.; Tang, C. K.; Tai, Y. W.: Few-shot object detection with attention-RPN and multi-relation detector. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4013–4022 (2020)
Valanarasu, J. M. J.; Oza, P.; Hacihaliloglu, I.; Patel, V. M.: Medical transformer: Gated axial-attention for medical image segmentation. In: Proceedings of the Medical Image Computing and Computer Assisted Intervention (MICCAI), pp. 36–46 (2021)
Liu, Z.; Wen, C.; Su, Z.; Liu, S.; Sun, J.; Kong, W.; Yang, Z: Emotion-semantic-aware dual contrastive learning for epistemic emotion identification of learner-generated reviews in MOOCs. IEEE Trans. Neural Netw. Learn. Syst. (2023).
Liu, Y.; Li, G.; Lin, L.: Cross-modal causal relational reasoning for event-level visual question answering. IEEE Trans. Pattern Anal. Mach. Intell. 45(10), 11624–11641 (2023)
Google Scholar
Wang, K.; Peng, X.; Yang, J.; Meng, D.; Qiao, Y.: Region attention networks for pose and occlusion robust facial expression recognition. IEEE Trans. Image Process. 29, 4057–4069 (2020)
Article Google Scholar
Cai, J.; Meng, Z.; Khan, A.S.; Li, Z.; O'Reilly, J.; Tong, Y.: Probabilistic attribute tree structured convolutional neural networks for facial expression recognition in the wild. IEEE Trans. Affect. Comput. (2022)
Wen, Y.; Zhang, K.; Li, Z.; Qiao, Y.: A discriminative feature learning approach for deep face recognition. In: Proceedings of the European Conference on Computer Vision (ECCV), vol. 14, pp. 499–515 (2016)
Cai, J.; Meng, Z.; Khan, A.S.; Li, Z.; O'Reilly, J.; Tong, Y.: Island loss for learning discriminative features in facial expression recognition. In: Proceedings of the IEEE International Conference on Automatic Face & Gesture Recognition (FG), pp. 302–309 (2018)
Wang, K.; Peng, X.; Yang, J.; Lu, S.; Qiao, Y.: Suppressing uncertainties for large-scale facial expression recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6897–6906 (2020)
Zhang, Y.; Wang, C.; Deng, W.: Relative uncertainty learning for facial expression recognition. In: Proceedings of Advanced Neural Information Processing Systems, vol. 34, pp. 17616–17627 (2021)
Yan, H.; Gu, Y.; Zhang, X.; Wang, Y.; Ji, Y.; Ren, F.: Mitigating label-noise for facial expression recognition in the wild. In: 2022 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6 (2022)
Li, Y.; Lu, Y.; Li, J.; Lu, G.: Separate loss for basic and compound facial expression recognition in the wild. In: Proceedings of the Asian Conference on Machine Learning (ACML), pp. 897–911 (2019)
Fan, X.; Deng, Z.; Wang, K.; Peng, X.; Qiao, Y.: Learning discriminative representation for facial expression recognition from uncertainties. In: Proceedings of the IEEE International Conference on Image Processing (ICIP), pp. 903–907 (2020)
Farzaneh, A.H.; Qi, X.: Discriminant distribution-agnostic loss for facial expression recognition in the wild. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 406–407 (2020)
Siqueira, H.; Magg, S.; Wermter, S.: Efficient facial feature learning with wide ensemble-based convolutional neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 5800–5809 (2020)
Liu, P.; Lin, Y.; Meng, Z.; Lu, L.; Deng, W.; Zhou, J.T.; Yang, Y.: Point adversarial self-mining: a simple method for facial expression recognition. IEEE T. Cybern. 1–12 (2021)
Ruan, D.; Yan, Y.; Lai, S.; Chai, Z.; Shen, C.; Wang, H.: Feature decomposition and reconstruction learning for effective facial expression recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7660–7669 (2021)
Zhao, Z.; Liu, Q.; Zhou, F.: Robust lightweight facial expression recognition network with label distribution training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 4, pp. 3510–3519 (2021)
Jiang, J.; Deng, W.: Disentangling identity and pose for facial expression recognition. IEEE Trans. Affect. Comput. 13(4), 1868–1878 (2022)
Article Google Scholar
Li, Y.; Lu, G.; Li, J.; Zhang, Z.; Zhang, D.: Facial expression recognition in the wild using multi-level features and attention mechanisms. IEEE Trans. Affect. Comput. (2020)
Xia, H.Y.; Li, C.; Tan, Y.; Li, L.; Song, S.: Destruction and reconstruction learning for facial expression recognition. IEEE Multimedia 28(2), 20–28 (2021)
Article Google Scholar
Zhao, Z.; Liu, Q.; Wang, S.: Learning deep global multi-scale and local attention features for facial expression recognition in the wild. IEEE Trans. Image Process. 30, 6544–6556 (2021)
Article Google Scholar
Guo, Y.; Huang, J.; Xiong, M.; Wang, Z.; Hu, X.; Wang, J.; Hijji, M.: Facial expressions recognition with multi-region divided attention networks for smart education cloud applications. Neurocomputing 493, 119–128 (2022)
Article Google Scholar
Liu, H.; Cai, H.; Lin, Q.; Li, X.; Xiao, H.: Adaptive multilayer perceptual attention network for facial expression recognition. IEEE Trans. Circuits Syst. Video Technol. 32(9), 6253–6266 (2022)
Article Google Scholar
Wang, C.; Xue, J.; Lu, K.; Yan, Y.: Light attention embedding for facial expression recognition. IEEE Trans. Circuits Syst. Video Technol. 32(4), 1834–1847 (2021)
Article Google Scholar
Ruan, D.; Mo, R.; Yan, Y.; Chen, S.; Xue, J.H.; Wang, H.: Adaptive deep disturbance-disentangled learning for facial expression recognition. Int. J. Comput. Vision 130(2), 455–477 (2022)
Article Google Scholar
Zhang, Z.; Tian, X.; Zhang, Y.; Guo, K.; Xu, X.: Enhanced discriminative global-local feature learning with priority for facial expression recognition. Inf. Sci. 630, 370–384 (2023)
Article Google Scholar
Ma, F.; Sun, B.; Li, S.: Facial expression recognition with visual transformers and attentional selective fusion. IEEE Trans. Affect. Comput. (2021)
Liang, X.; Xu, L.; Zhang, W.; Zhang, Y.; Liu, J.; Liu, Z.: A convolution-transformer dual branch network for head-pose and occlusion facial expression recognition. Vis. Comput. 1–14 (2022)
Sun, N.; Song, Y.; Liu, J.; Chai, L.; Sun, H.: Appearance and geometry transformer for facial expression recognition in the wild. Comput. Electr. Eng. 107, 108583 (2023)
Article Google Scholar
Wen, Y.; Zhang, K.; Li, Z.; Qiao, Y.: A discriminative feature learning approach for deep face recognition. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 499–515 (2016)
Farzaneh, A.H.; Qi, X.: Facial expression recognition in the wild via deep attentive center loss. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2402–2411 (2021)
Dosovitskiy, A.; et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Proceedings of the International Conference on Learning Representations (ICLR), pp. 1–22 (2020)
Jeon, Y.; Kim, J.: Constructing fast network through deconstruction of convolution. In: Proceedings of Advanced Neural Information Processing Systems, vol. 31 (2018)
Wang, G.; Zhao, Y.; Tang, C.; Luo, C.; Zeng, W.: When shift operation meets vision transformer: An extremely simple alternative to attention mechanism. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 2, pp. 2423–2430 (2022)
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, J.; Gomez, A.N.; Kaiser, L.; Polosukhin, I.: Attention is all you need. In: Proceedings of the Advances in Neural Information Processing Systems, vol. 30 (2017)
Goodfellow, I.J.; Erhan, D.; Carrier, P.L.; Courville, A.; Mirza, M.; Hamner, B.; Bengio, Y.: Challenges in representation learning: A report on three machine learning contests. In: Proceedings of the International Conference on Neural Information Processing, pp. 117–124 (2013)
Guo, Y.; Zhang, L.; Hu, Y.; He, X.; Gao, J.: Ms-celeb-1m: a dataset and benchmark for large-scale face recognition. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 87–102 (2016)
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D.: Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 618–626 (2017)
Van der Maaten, L.; Hinton, G.: Visualizing data using t-SNE. J. mach. Learn. Res. 9(11), 2579–2605 (2008)
Google Scholar
Georgescu, M.I.; Ionescu, R.T.; Popescu, M.: Local learning with deep and hand-crafted features for facial expression recognition. IEEE Access 7, 64827–64836 (2019)
Article Google Scholar
Liu, C.; Hirota, K.; Dai, Y.: Patch attention convolutional vision transformer for facial expression recognition with occlusion. Inf. Sci. 619, 781–794 (2023)
Article Google Scholar
Li, Y.; Zeng, J.; Shan, S.; Chen, X.: Occlusion aware facial expression recognition using CNN with attention mechanism. IEEE Trans. Image Process. 28(5), 2439–2450 (2018)
Article MathSciNet Google Scholar

Download references

Acknowledgements

This work was supported in part by the National Science Foundation of China under Grant 61966035, 62266043 and U1803261, in part by National Science and Technology Major Project under Grant 95-Y50G37-9001-22/23, and in part by Basic Research Foundation of Universities in the Xinjiang Uygur Autonomous Region of China under Grant 2021D01C083.

Author information

Authors and Affiliations

School of Information Science and Engineering, Xinjiang University, Urumqi, 830046, China
Weijun Gong & Yurong Qian
School of Software, Xinjiang University, Urumqi, 830046, China
Zhiyao La, Yurong Qian & Weihang Zhou
Key Laboratory of Signal Detection and Processing in Xinjiang Uygur Autonomous Region, Urumqi, 830046, China
Yurong Qian

Authors

Weijun Gong
View author publications
You can also search for this author in PubMed Google Scholar
Zhiyao La
View author publications
You can also search for this author in PubMed Google Scholar
Yurong Qian
View author publications
You can also search for this author in PubMed Google Scholar
Weihang Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors were involved in the conceptualization and design of the study. GW and LZ performed material preparation, data collection, and analysis. GW and ZW wrote the first draft of the manuscript, and all authors commented on previous versions of the manuscript. QY read and approved the final manuscript.

Corresponding author

Correspondence to Yurong Qian.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Gong, W., La, Z., Qian, Y. et al. Hybrid Attention-Aware Learning Network for Facial Expression Recognition in the Wild. Arab J Sci Eng (2024). https://doi.org/10.1007/s13369-023-08538-6

Download citation

Received: 08 August 2023
Accepted: 19 November 2023
Published: 05 January 2024
DOI: https://doi.org/10.1007/s13369-023-08538-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hybrid Attention-Aware Learning Network for Facial Expression Recognition in the Wild

Abstract

Access this article

Similar content being viewed by others

CBAM: Convolutional Block Attention Module

Facial emotion recognition using convolutional neural networks (FERC)

Convolutional neural network: a review of models, methodologies and applications to object detection

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Hybrid Attention-Aware Learning Network for Facial Expression Recognition in the Wild

Abstract

Access this article

Similar content being viewed by others

CBAM: Convolutional Block Attention Module

Facial emotion recognition using convolutional neural networks (FERC)

Convolutional neural network: a review of models, methodologies and applications to object detection

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation