Skip to main content
Log in

Hybrid Attention-Aware Learning Network for Facial Expression Recognition in the Wild

  • Research Article-Computer Engineering and Computer Science
  • Published:
Arabian Journal for Science and Engineering Aims and scope Submit manuscript

Abstract

Facial expression recognition (FER) in the wild is one of the most challenging visual tasks owing to various uncontrolled factors such as occlusion, pose, and subtle variation in real scenes. These factors can directly affect the robust performance of current networks, especially as most single-feature learning space methods lack the extraction of potential discriminative features and fail to provide a deeper understanding of expressions. To address the above issues, we propose a novel hybrid attention-aware learning network (HALNet), which comprises a feature compactness network (FCN), a hybrid attention enhancement network (HAEN), and a joint loss optimization strategy. First, FCN performs basic expression feature extraction and optimizes intra- and inter-class distributions simultaneously. Afterward, HAEN constructs a multi-level feature enhancement space by fusing hybrid attention based on CNN and transformer in parallel to effectively improve the profound understanding of expressions. Finally, the expression classification is performed by supervised optimization with joint loss. Extensive experiments are assessed on some of the widest employed wild expression datasets, and results indicate our method is superior to several present state-of-the-art methods, obtaining accuracies of 90.29%, 90.04%, and 61.75% on RAF-DB, FERPlus, and AffectNet, respectively. The cross-dataset and occlusion and pose variation datasets assessment further substantiate our approach’s sound generalization and robustness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data Availability

The datasets used in the paper are all freely available, and the use request is authorized for non-profit purposes.

References

  1. Liu, Z.; Wu, M.; Cao, W.; Chen, L.; Xu, J.; Zhang, R.; Meng, Z.; Jun, M.: A facial ex pression emotion recognition based human-robot interaction system. IEEE CAA J. Autom. Sin. 4(4), 668–676 (2017)

    Article  Google Scholar 

  2. Corneanu, C.A.; Simón, M.O.; Cohn, J.F.; Guerrero, S.E.: Survey on RGB, 3D, thermal, and multimodal approaches for facial expression recognition: History, trends, and affect-related applications. IEEE Trans. Pattern Anal. Mach. Intell. 38(8), 1458–1568 (2016)

    Article  Google Scholar 

  3. Fei, Z.; Erfu, Y.; David, L.; Stephen, B.; Winifred, I.; Xia, L.; Huiyu, Z.: Deep convolution network based emotion analysis towards mental health care. Neurocomputing 388, 212–227 (2020)

    Article  Google Scholar 

  4. Bisogni, C.; Castiglione, A.; Hossain, S.; Narducci, F.; Umer, S.: Impact of deep learning approaches on facial expression recognition in healthcare industries. IEEE Trans. Ind. Inform. 18(8), 5619–5627 (2022)

    Article  Google Scholar 

  5. Lucey, P.; Cohn, J.F.; Kanade, T.; Saragih, J.; Ambadar, Z.; Matthews, I.: The extended cohn-kanade dataset (ck+): a complete dataset for action unit and emotion-specified expression. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 94–101 (2010)

  6. Zhao, G.; Huang, X.; Taini, M.; Li, S.Z.; PietikäInen, M.: Facial expression recognition from near-infrared videos. Image Vis. Comput. 29(9), 607–619 (2011)

    Article  Google Scholar 

  7. Pantic, M.; Valstar, M.; Rademaker, R.; Maat, L.: Web-based database for facial expression analysis. In: Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), pp. 5 (2005)

  8. Li, S.; Deng, W.; Du, J.P.: Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2852–2861 (2017)

  9. Barsoum, E.; Zhang, C.; Ferrer, C.C.; Zhang, Z.: Training deep networks for facial expression recognition with crowd-sourced label distribution. In: Proceedings of the ACM International Conference on Multimodal Interaction (ICMI), pp. 279–283 (2016)

  10. Mollahosseini, A.; Hasani, B.; Mahoor, M.H.: AffectNet: A database for facial ex pression, valence, and arousal computing in the wild. IEEE Trans. Affect. Comput. 10(1), 18–31 (2017)

    Article  Google Scholar 

  11. Zhao, G.; Pietikainen, M.: Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 915–928 (2007)

    Article  Google Scholar 

  12. Aamir, M.; Ali, T.; Shaf, A.; Irfan, M.; Saleem, M.Q.: ML-DCNNet: multi-level deep convolutional neural network for facial expression recognition and intensity estimation. Arab. J. Sci. Eng. 45(12), 10605–10620 (2020)

    Article  Google Scholar 

  13. Happy, S.L.; Routray, A.: Automatic facial expression recognition using features of salient facial patches. IEEE Trans. Affect. Comput. 6(1), 1–12 (2014)

    Article  Google Scholar 

  14. Yan, Y.; Zhang, Z.; Chen, S.; Wang, H.: Low-resolution facial expression recognition: A filter learning perspective. Signal Process. 169, 107370 (2020)

    Article  Google Scholar 

  15. Zhang, Z.; Luo, P.; Loy, C.C.; Tang, X.: From facial expression recognition to inter personal relation prediction. Int. J. Comput. Vis. 126, 550–569 (2018)

    Article  MathSciNet  Google Scholar 

  16. Sepas-Moghaddam, A.; Etemad, A.; Pereira, F.; Correia, P.L.: Capsfield: Light field-based face and expression recognition in the wild using capsule routing. IEEE Trans. Image Process. 30, 2627–2642 (2021)

    Article  Google Scholar 

  17. Arnaud, E.; Dapogny, A.; Bailly, K.: Thin: Throwable information networks and application for facial expression recognition in the wild. IEEE Trans. Affect. Comput. (2022)

  18. Fan, Q.; Zhuo, W.; Tang, C. K.; Tai, Y. W.: Few-shot object detection with attention-RPN and multi-relation detector. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4013–4022 (2020)

  19. Valanarasu, J. M. J.; Oza, P.; Hacihaliloglu, I.; Patel, V. M.: Medical transformer: Gated axial-attention for medical image segmentation. In: Proceedings of the Medical Image Computing and Computer Assisted Intervention (MICCAI), pp. 3646 (2021)

  20. Liu, Z.; Wen, C.; Su, Z.; Liu, S.; Sun, J.; Kong, W.; Yang, Z: Emotion-semantic-aware dual contrastive learning for epistemic emotion identification of learner-generated reviews in MOOCs. IEEE Trans. Neural Netw. Learn. Syst. (2023).

  21. Liu, Y.; Li, G.; Lin, L.: Cross-modal causal relational reasoning for event-level visual question answering. IEEE Trans. Pattern Anal. Mach. Intell. 45(10), 11624–11641 (2023)

    Google Scholar 

  22. Wang, K.; Peng, X.; Yang, J.; Meng, D.; Qiao, Y.: Region attention networks for pose and occlusion robust facial expression recognition. IEEE Trans. Image Process. 29, 4057–4069 (2020)

    Article  Google Scholar 

  23. Cai, J.; Meng, Z.; Khan, A.S.; Li, Z.; O'Reilly, J.; Tong, Y.: Probabilistic attribute tree structured convolutional neural networks for facial expression recognition in the wild. IEEE Trans. Affect. Comput. (2022)

  24. Wen, Y.; Zhang, K.; Li, Z.; Qiao, Y.: A discriminative feature learning approach for deep face recognition. In: Proceedings of the European Conference on Computer Vision (ECCV), vol. 14, pp. 499–515 (2016)

  25. Cai, J.; Meng, Z.; Khan, A.S.; Li, Z.; O'Reilly, J.; Tong, Y.: Island loss for learning discriminative features in facial expression recognition. In: Proceedings of the IEEE International Conference on Automatic Face & Gesture Recognition (FG), pp. 302–309 (2018)

  26. Wang, K.; Peng, X.; Yang, J.; Lu, S.; Qiao, Y.: Suppressing uncertainties for large-scale facial expression recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6897–6906 (2020)

  27. Zhang, Y.; Wang, C.; Deng, W.: Relative uncertainty learning for facial expression recognition. In: Proceedings of Advanced Neural Information Processing Systems, vol. 34, pp. 17616–17627 (2021)

  28. Yan, H.; Gu, Y.; Zhang, X.; Wang, Y.; Ji, Y.; Ren, F.: Mitigating label-noise for facial expression recognition in the wild. In: 2022 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6 (2022)

  29. Li, Y.; Lu, Y.; Li, J.; Lu, G.: Separate loss for basic and compound facial expression recognition in the wild. In: Proceedings of the Asian Conference on Machine Learning (ACML), pp. 897–911 (2019)

  30. Fan, X.; Deng, Z.; Wang, K.; Peng, X.; Qiao, Y.: Learning discriminative representation for facial expression recognition from uncertainties. In: Proceedings of the IEEE International Conference on Image Processing (ICIP), pp. 903–907 (2020)

  31. Farzaneh, A.H.; Qi, X.: Discriminant distribution-agnostic loss for facial expression recognition in the wild. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 406–407 (2020)

  32. Siqueira, H.; Magg, S.; Wermter, S.: Efficient facial feature learning with wide ensemble-based convolutional neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 5800–5809 (2020)

  33. Liu, P.; Lin, Y.; Meng, Z.; Lu, L.; Deng, W.; Zhou, J.T.; Yang, Y.: Point adversarial self-mining: a simple method for facial expression recognition. IEEE T. Cybern. 1–12 (2021)

  34. Ruan, D.; Yan, Y.; Lai, S.; Chai, Z.; Shen, C.; Wang, H.: Feature decomposition and reconstruction learning for effective facial expression recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7660–7669 (2021)

  35. Zhao, Z.; Liu, Q.; Zhou, F.: Robust lightweight facial expression recognition network with label distribution training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 4, pp. 3510–3519 (2021)

  36. Jiang, J.; Deng, W.: Disentangling identity and pose for facial expression recognition. IEEE Trans. Affect. Comput. 13(4), 1868–1878 (2022)

    Article  Google Scholar 

  37. Li, Y.; Lu, G.; Li, J.; Zhang, Z.; Zhang, D.: Facial expression recognition in the wild using multi-level features and attention mechanisms. IEEE Trans. Affect. Comput. (2020)

  38. Xia, H.Y.; Li, C.; Tan, Y.; Li, L.; Song, S.: Destruction and reconstruction learning for facial expression recognition. IEEE Multimedia 28(2), 20–28 (2021)

    Article  Google Scholar 

  39. Zhao, Z.; Liu, Q.; Wang, S.: Learning deep global multi-scale and local attention features for facial expression recognition in the wild. IEEE Trans. Image Process. 30, 6544–6556 (2021)

    Article  Google Scholar 

  40. Guo, Y.; Huang, J.; Xiong, M.; Wang, Z.; Hu, X.; Wang, J.; Hijji, M.: Facial expressions recognition with multi-region divided attention networks for smart education cloud applications. Neurocomputing 493, 119–128 (2022)

    Article  Google Scholar 

  41. Liu, H.; Cai, H.; Lin, Q.; Li, X.; Xiao, H.: Adaptive multilayer perceptual attention network for facial expression recognition. IEEE Trans. Circuits Syst. Video Technol. 32(9), 6253–6266 (2022)

    Article  Google Scholar 

  42. Wang, C.; Xue, J.; Lu, K.; Yan, Y.: Light attention embedding for facial expression recognition. IEEE Trans. Circuits Syst. Video Technol. 32(4), 1834–1847 (2021)

    Article  Google Scholar 

  43. Ruan, D.; Mo, R.; Yan, Y.; Chen, S.; Xue, J.H.; Wang, H.: Adaptive deep disturbance-disentangled learning for facial expression recognition. Int. J. Comput. Vision 130(2), 455–477 (2022)

    Article  Google Scholar 

  44. Zhang, Z.; Tian, X.; Zhang, Y.; Guo, K.; Xu, X.: Enhanced discriminative global-local feature learning with priority for facial expression recognition. Inf. Sci. 630, 370–384 (2023)

    Article  Google Scholar 

  45. Ma, F.; Sun, B.; Li, S.: Facial expression recognition with visual transformers and attentional selective fusion. IEEE Trans. Affect. Comput. (2021)

  46. Liang, X.; Xu, L.; Zhang, W.; Zhang, Y.; Liu, J.; Liu, Z.: A convolution-transformer dual branch network for head-pose and occlusion facial expression recognition. Vis. Comput. 1–14 (2022)

  47. Sun, N.; Song, Y.; Liu, J.; Chai, L.; Sun, H.: Appearance and geometry transformer for facial expression recognition in the wild. Comput. Electr. Eng. 107, 108583 (2023)

    Article  Google Scholar 

  48. Wen, Y.; Zhang, K.; Li, Z.; Qiao, Y.: A discriminative feature learning approach for deep face recognition. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 499–515 (2016)

  49. Farzaneh, A.H.; Qi, X.: Facial expression recognition in the wild via deep attentive center loss. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2402–2411 (2021)

  50. Dosovitskiy, A.; et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Proceedings of the International Conference on Learning Representations (ICLR), pp. 1–22 (2020)

  51. Jeon, Y.; Kim, J.: Constructing fast network through deconstruction of convolution. In: Proceedings of Advanced Neural Information Processing Systems, vol. 31 (2018)

  52. Wang, G.; Zhao, Y.; Tang, C.; Luo, C.; Zeng, W.: When shift operation meets vision transformer: An extremely simple alternative to attention mechanism. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 2, pp. 2423–2430 (2022)

  53. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, J.; Gomez, A.N.; Kaiser, L.; Polosukhin, I.: Attention is all you need. In: Proceedings of the Advances in Neural Information Processing Systems, vol. 30 (2017)

  54. Goodfellow, I.J.; Erhan, D.; Carrier, P.L.; Courville, A.; Mirza, M.; Hamner, B.; Bengio, Y.: Challenges in representation learning: A report on three machine learning contests. In: Proceedings of the International Conference on Neural Information Processing, pp. 117–124 (2013)

  55. Guo, Y.; Zhang, L.; Hu, Y.; He, X.; Gao, J.: Ms-celeb-1m: a dataset and benchmark for large-scale face recognition. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 87–102 (2016)

  56. Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D.: Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 618–626 (2017)

  57. Van der Maaten, L.; Hinton, G.: Visualizing data using t-SNE. J. mach. Learn. Res. 9(11), 2579–2605 (2008)

    Google Scholar 

  58. Georgescu, M.I.; Ionescu, R.T.; Popescu, M.: Local learning with deep and hand-crafted features for facial expression recognition. IEEE Access 7, 64827–64836 (2019)

    Article  Google Scholar 

  59. Liu, C.; Hirota, K.; Dai, Y.: Patch attention convolutional vision transformer for facial expression recognition with occlusion. Inf. Sci. 619, 781–794 (2023)

    Article  Google Scholar 

  60. Li, Y.; Zeng, J.; Shan, S.; Chen, X.: Occlusion aware facial expression recognition using CNN with attention mechanism. IEEE Trans. Image Process. 28(5), 2439–2450 (2018)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This work was supported in part by the National Science Foundation of China under Grant 61966035, 62266043 and U1803261, in part by National Science and Technology Major Project under Grant 95-Y50G37-9001-22/23, and in part by Basic Research Foundation of Universities in the Xinjiang Uygur Autonomous Region of China under Grant 2021D01C083.

Author information

Authors and Affiliations

Authors

Contributions

All authors were involved in the conceptualization and design of the study. GW and LZ performed material preparation, data collection, and analysis. GW and ZW wrote the first draft of the manuscript, and all authors commented on previous versions of the manuscript. QY read and approved the final manuscript.

Corresponding author

Correspondence to Yurong Qian.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gong, W., La, Z., Qian, Y. et al. Hybrid Attention-Aware Learning Network for Facial Expression Recognition in the Wild. Arab J Sci Eng (2024). https://doi.org/10.1007/s13369-023-08538-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13369-023-08538-6

Keywords

Navigation