Abstract
Emotion recognition in the wild (ERW) is a challenging task due to unknown and the unconstrained scenes in the wild environment. Different from previous approaches that use facial expression or posture for ERW, a growing number of researches are beginning to utilize contextual information to improve the performance of emotion recognition. In this paper, we propose a new dual-view context-aware network (DVC-Net) to fully explore the usage of contextual information from global and local views, and balance the individual features and context features by introducing the attention mechanism. The proposed DVC-Net consists of three parallel modules: (1) the body-aware stream to suppress the uncertainties of body gesture feature representation, (2) the global context-aware stream based on salient context to capture the global-level effective context, and (3) the local context-aware stream based on graph convolutional network to find the local discriminative features with emotional cues. Quantitative evaluations have been carried out on two in-the-wild emotion recognition datasets. The experimental results demonstrated that the proposed DVC-Net outperforms the state-of-the-art methods.
Similar content being viewed by others
Data availability
The data used to support the findings of this study are available from the corresponding author upon request.
References
Hortensius R, Hekele F, Cross ES (2018) The perception of emotion in artificial agents. IEEE Trans Cogn Dev Syst 10 (4):852–864
Sini J, Marceddu AC, Violante M (2020) Automatic emotion recognition for the calibration of autonomous driving functions. Electronics 9 (3):518
Spekman ML, Konijn EA, Hoorn JF, The importance of coping appraisals and coping strategies (2018) Perceptions of healthcare robots as a function of emotion-based coping. Comput Hum Behav 85:308–318
Kao T-F, Du Y-Z (2020) A study on the influence of green advertising design and environmental emotion on advertising effect. J Clean Prod 242:118294
Luo Y, Ye J, Adams RB, Li J, Newman MG, Arbee Wang J.Z (2020) Towards automated recognition of bodily expression of emotion in the wild. Int J Comput Vis 128 (1):1–25
Dael N, Mortillaro M, Scherer KR (2012) Emotion expression in body action and posture. Emotion 12 (5):1085
Peng Y, Tang R, Kong W, Nie F A factorized extreme learning machine and its applications in EEG-based emotion recognition. In: International conference on neural information processing, pp 11–20 (2020). Springer
Kossaifi J, Tzimiropoulos G, Todorovic S, Pantic M (2017) AFEW-VA database for valence and arousal estimation in-the-wild. Image Vis Comput 65:23–36
Fabian Benitez-Quiroz C, Srinivasan R, Martinez AM Emotionet (2016) An accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5562–5570
Li Y, Zeng J, Shan S, Chen X (2018) Occlusion aware facial expression recognition using CNN with attention mechanism. IEEE Trans Image Process 28 (5):2439–2450
Barrett LF, Mesquita B, Gendron M (2011) Context in emotion perception. Curr Dir Psychol Sci 20 (5):286–290
Lee J, Kim S, Kim S, Park J, Sohn K (2019) Context-aware emotion recognition networks. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10143–10152
Kosti R, Alvarez J.M, Recasens A, Lapedriza A (2017) Emotion recognition in context. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1667–1675
Georghiades AS, Belhumeur PN, Kriegman DJ (2001) From few to many: illumination cone models for face recognition under variable lighting and pose. IEEE Trans Pattern Anal Mach Intell 23 (6):643–660
Goodfellow IJ, Erhan D, Carrier PL, Courville A, Mirza M, Hamner B, Cukierski W, Tang Y, Thaler D, Lee D-H, et al (2013) Challenges in representation learning: a report on three machine learning contests. In: International conference on neural information processing, pp 117–124. Springer
Jung H, Lee S, Yim J, Park S, Kim J (2015) Joint fine-tuning in deep neural networks for facial expression recognition. In: Proceedings of the IEEE international conference on computer vision, pp 2983–2991
Jain DK, Shamsolmoali P, Sehdev P (2019) Extended deep neural network for facial emotion recognition. Pattern Recogn Lett 120:69–74
Molchanov P, Gupta S, Kim K, Kautz J (2015) Hand gesture recognition with 3D convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 1–7
Cao Z, Hidalgo G, Simon T, Wei S-E, Sheikh Y (2019) Openpose: realtime multi-person 2D pose estimation using part affinity fields. IEEE Trans Pattern Anal Mach Intell 43 (1):172–186
Noroozi F, Kaminska D, Corneanu C, Sapinski T, Escalera S, Anbarjafari G (2018) Survey on emotional body gesture recognition. IEEE Trans Affect Comput
Ferdinando H, Seppänen T, Alasaarela E (2017) Enhancing emotion recognition from ECG signals using supervised dimensionality reduction. In: ICPRAM, pp 112–118
Qing C, Qiao R, Xu X, Cheng Y (2019) Interpretable emotion recognition using EEG signals. IEEE Access 7:94160–94170
Kwon S et al (2020) A CNN-assisted enhanced audio signal processing for speech emotion recognition. Sensors 20 (1):183
Wang K, Peng X, Yang J, Meng D, Qiao Y (2020) Region attention networks for pose and occlusion robust facial expression recognition. IEEE Trans Image Process 29:4057–4069
Wang K, Peng X, Yang J, Lu S, Qiao Y (2020) Suppressing uncertainties for large-scale facial expression recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6897–6906
Huang Y, Wen H, Qing L, Jin R, Xiao L (2021)Emotion recognition based on body and context fusion in the wild. In: Proceedings of the IEEE/CVF conference on international conference on computer vision workshops (ICCVW), pp 3602–3610
Machajdik J, Hanbury A (2010) Affective image classification using features inspired by psychology and art theory. In: Proceedings of the 18th ACM international conference on multimedia, pp 83–92
Alameda-Pineda X, Ricci E, Yan Y, Sebe N (2016) Recognizing emotions from abstract paintings using non-linear matrix completion. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5240–5248
Zhang M, Liang Y, Ma H (2019) Context-aware affective graph reasoning for emotion recognition. In: 2019 IEEE international conference on multimedia and expo (ICME), pp 151–156. IEEE
Mittal T, Guhan P, Bhattacharya U, Chandra R, Bera A, Manocha D (2020) Emoticon: Context-aware multimodal emotion recognition using Frege’s principle. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14234–14243
Khan A.S, Li Z, Cai J, Tong Y (2021) Regional attention networks with context-aware fusion for group emotion recognition. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 1150–1159
Gao Q, Zeng H, Li G, Tong T (2021) Graph reasoning-based emotion recognition network. IEEE Access 9:6488–6497
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105
Fukui H, Hirakawa T, Yamashita T, Fujiyoshi H (2019) Attention branch network: Learning of attention mechanism for visual explanation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10705–10714
Selvaraju R.R, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision, pp 618–626
Noroozi M, Favaro P (2016) Unsupervised learning of visual representations by solving jigsaw puzzles. In: European conference on computer vision, pp 69–84. Springer
Loshchilov I, Hutter F (2016) Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L et al (2019) Pytorch: an imperative style, high-performance deep learning library. Adv Neural Inf Process Syst 32:8026–8037
Kosti R, Alvarez JM, Recasens A, Lapedriza A (2019) A context based emotion recognition using EMOTIC dataset. IEEE Trans Pattern Anal Mach Intell 42 (11):2755–2766
Bendjoudi I, Vanderhaegen F, Hamad D, Dornaika F (2021) Multi-label, multi-task CNN approach for context-based emotion recognition. Inf Fusion 76:422–428
Liu M, Sun X, Zhang F, Yu Y, Wang Y (2021) Context-LGM: leveraging object-context relation for context-aware object recognition. arXiv preprint arXiv:2110.04042
Singh A, Fan S, Kankanhalli M (2021) Human attributes prediction under privacy-preserving conditions. In: Proceedings of the 29th ACM international conference on multimedia, pp 4698–4706
Xia Z, Pan X, Song S, Li LE, Huang G (2022) Vision transformer with deformable attention. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4794–4803
Wang W, Yao L, Chen L, Cai D, He X, Liu W (2021) Crossformer: a versatile vision transformer based on cross-scale attention. arXiv e-prints, 2108
Acknowledgements
This work is supported by the National Natural Science Foundation of China [grant number 61871278].
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that there are no conflicts of interest with respect to the research, authorship, and/or publication of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Qing, L., Wen, H., Chen, H. et al. DVC-Net: a new dual-view context-aware network for emotion recognition in the wild. Neural Comput & Applic 36, 653–665 (2024). https://doi.org/10.1007/s00521-023-09040-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-023-09040-8