Abstract
The real-world facial expression recognition (FER) datasets suffer from noisy annotations due to crowd-sourcing, ambiguity in expressions, the subjectivity of annotators, and inter-class similarity. However, the recent deep networks have a strong capacity to memorize the noisy annotations leading to corrupted feature embedding and poor generalization. Recent works handle the problem by selecting samples with clean labels based on loss values using a fixed threshold for all the classes which may not always be reliable. They also depend upon the noise rate in the data which may not always be available. In this work, we propose a novel FER framework (DNFER) in which samples with clean labels are selected based on a class-specific threshold, computed dynamically in each mini-batch. Specifically, DNFER uses supervised training on selected clean samples and unsupervised consistent training on all the samples. This threshold is independent of noise rate and does not need any clean data, unlike other methods. In addition, to effectively learn from noisy annotated samples, the posterior distributions between weakly-augmented image and strongly-augmented image are aligned using an unsupervised consistency loss. We demonstrate the robustness of DNFER on both synthetic as well as on real noisy annotated FER datasets. In addition, DNFER obtains state-of-the-art performance on popular benchmark datasets, with 90.41% on RAFDB, 57.77 % on SFEW, 89.32% on FERPlus, and 65.22% on AffectNet-7. Our source codes are made publicly available at https://github.com/1980x/DNFER.
Similar content being viewed by others
References
Liu Z, Peng Y, Hu W (2020) Driver fatigue detection based on deeply-learned facial expression representation. J Vis Commun Image Represent 71:102723
Bisogni C, Castiglione A, Hossain S, Narducci F, Umer S (2022) Impact of deep learning approaches on facial expression recognition in healthcare industries. IEEE Trans Ind Inform 18(8):5619–5627. https://doi.org/10.1109/TII.2022.3141400
Maqableh W, Alzyoud FY, Zraqou J (2023) The use of facial expressions in measuring students’ interaction with distance learning environments during the covid-19 crisis. Vis Inform 7(1):1–17
Yan L, Sheng M, Wang C, Gao R, Yu H (2022) Hybrid neural networks based facial expression recognition for smart city. Multimed Tools Appl 1–24
Abate AF, Bisogni C, Cascone L, Castiglione A, Costabile G, Mercuri I (2020) Social robot interactions for social engineering: opportunities and open issues. In: 2020 IEEE intl conf on dependable, autonomic and secure computing, intl conf on pervasive intelligence and computing, intl conf on cloud and big data computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech), pp. 539–547. https://doi.org/10.1109/DASC-PICom-CBDCom-CyberSciTech49142.2020.00097
Ding H, Zhou P, Chellappa R (2020) Occlusion-adaptive deep network for robust facial expression recognition. In: 2020 IEEE international joint conference on biometrics (IJCB), pp. 1–9. IEEE
Gera D, Balasubramanian S (2021) Landmark guidance independent spatio channel attention and complementary context information based facial expression recognition. Pattern Recognit Lett 145:58–66
Mao S, Shi G, Gou S, Yan D, Jiao L, Xiong L (2022) Adaptively lighting up facial expression crucial regions via local non-local joint network. arXiv:2203.14045
Wang K, Peng X, Yang J, Meng D, Qiao Y (2020) Region attention networks for pose and occlusion robust facial expression recognition. IEEE Trans Image Process 29:4057–4069
Ruan D, Yan Y, Lai S, Chai Z, Shen C, Wang H (2021) Feature decomposition and reconstruction learning for effective facial expression recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7660–7669
Zhao Z, Liu Q, Zhou F (2021) Robust lightweight facial expression recognition network with label distribution training. In: Proceedings of the AAAI conference on artificial intelligence, vol. 35, pp. 3510–3519
Li S, Deng W (2019) Reliable crowdsourcing and deep locality-preserving learning for unconstrained facial expression recognition. IEEE Trans Image Process 28(1):356–370
Li S, Deng W, Du J (2017) Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp. 2584–2593. IEEE
Barsoum E, Zhang C, Ferrer CC, Zhang Z (2016) Training deep networks for facial expression recognition with crowdsourced label distribution. In Proceedings of the 18th ACM international conference on multimodal interaction, 279–283
Mollahosseini A, Hasani B, Mahoor MH (2017) Affectnet: A database for facial expression, valence, and arousal computing in the wild. IEEE Trans Affect Comput
Dhall A, Goecke R, Lucey S, Gedeon T (2011) Static facial expression analysis in tough conditions: Data, evaluation protocol and benchmark. In: 2011 IEEE international conference on computer vision workshops (ICCV Workshops), pp. 2106–2112. IEEE
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778
Arpit D, Jastrzkebski S, Ballas N, Krueger D, Bengio E, Kanwal MS, Maharaj T, Fischer A, Courville A, Bengio Y et al (2017) A closer look at memorization in deep networks. In: International conference on machine learning, pp. 233–242. PMLR
Zhang C, Bengio S, Hardt M, Recht B, Vinyals O (2021) Understanding deep learning (still) requires rethinking generalization. Commun ACM 64(3):107–115
Wang K, Peng X, Yang J, Lu S, Qiao Y (2020) Suppressing uncertainties for large-scale facial expression recognition. In: CVPR, pp. 6897–6906
Gera D, Vikas G, Balasubramanian S (2021) Handling ambiguous annotations for facial expression recognition in the wild. In: Proceedings of the twelfth Indian conference on computer vision, graphics and image processing, pp. 1–9
Liu Y, Zhang X, Kauttonen J, Zhao G (2022) Uncertain label correction via auxiliary action unit graphs for facial expression recognition. 777–783. https://doi.org/10.1109/ICPR56361.2022.9956650. IEEE
Mao S, Shi G, Jiao L, Gou S, Li Y, Xiong L, Shi B (2021) Label distribution amendment with emotional semantic correlations for facial expression recognition. arXiv:2107.11061
She J, Hu Y, Shi H, Wang J, Shen Q, Mei T (2021) Dive into ambiguity: Latent distribution mining and pairwise uncertainty estimation for facial expression recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 6248–6257
Gera D, Balasubramanian S (2021) Noisy annotations robust consensual collaborative affect expression recognition. In: 2021 IEEE/CVF international conference on computer vision workshops (ICCVW), pp. 3578–3585. https://doi.org/10.1109/ICCVW54120.2021.00399
Fan X, Deng Z, Wang K, Peng X, Qiao Y (2020) Learning discriminative representation for facial expression recognition from uncertainties. In: 2020 IEEE international conference on image processing (ICIP), pp. 903–907. IEEE
Zhang Y, Wang C, Deng W (2021) Relative uncertainty learning for facial expression recognition. Adv Neural Inf Process Syst 34:17616–17627
Yu X, Han B, Yao J, Niu G, Tsang I, Sugiyama M (2019) How does disagreement help generalization against label corruption? In: International conference on machine learning, pp. 7164–7173. PMLR
Wei H, Feng L, Chen X, An B (2020) Combating noisy labels by agreement: A joint training method with co-regularization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 13726–13735
Sarfraz F, Arani E, Zonooz B (2021) Noisy concurrent training for efficient learning under label noise. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp. 3159–3168
Jiang L, Zhou Z, Leung T, Li LJ, Fei Fei L (2018) Mentornet: Learning data-driven curriculum for very deep neural networks on corrupted labels. In: International conference on machine learning, pp. 2304–2313. PMLR
Han B, Yao Q, Yu X, Niu G, Xu M, Hu W, Tsang IW, Sugiyama M (2018) Co-teaching: robust training of deep neural networks with extremely noisy labels. In: Proceedings of the 32nd international conference on neural information processing systems, pp. 8536–8546
Yuan B, Chen J, Zhang W, Tai HS, McMains S (2018) Iterative cross learning on noisy labels. In: 2018 IEEE winter conference on applications of computer vision (WACV), pp. 757–765. IEEE
Ren M, Zeng W, Yang B, Urtasun R (2018) Learning to reweight examples for robust deep learning. In: International conference on machine learning, pp. 4334–4343. PMLR
Wang X, Hua Y, Kodirov E, Robertson NM (2019) Imae for noise-robust learning: Mean absolute error does not treat examples equally and gradient magnitude’s variance matters. arXiv:1903.12141
Wang Y, Ma X, Chen Z, Luo Y, Yi J, Bailey J (2019) Symmetric cross entropy for robust learning with noisy labels. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 322–330
Song H, Kim M, Park D, Shin Y, Lee J-G (2022) Learning from noisy labels with deep neural networks: A survey. IEEE Trans Neural Netw Learn Syst
Malach E, Shalev-Shwartz S (2017) Decoupling” when to update” from” how to update”. Adv Neural Inf Process Syst 960–970
Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Proceedings of the eleventh annual conference on computational learning theory, pp. 92–100
Gera D, Balasubramanian S (2021) Co-curing noisy annotations for facial expression recognition. ICTACT J Image Vid Process (IJIVP). https://doi.org/10.21917/ijivp.2021.0356
Ge H, Zhu Z, Dai Y, Wang B, Wu X (2022) Facial expression recognition based on deep learning. Comput Methods Programs Biomed 215:106621
Sun Z, Zhang H, Bai J, Liu M, Hu Z (2023) A discriminatively deep fusion approach with improved conditional gan (im-cgan) for facial expression recognition. Pattern Recognit 135:109157
Li Y, Zeng J, Shan S, Chen X (2019) Occlusion aware facial expression recognition using cnn with attention mechanism. IEEE Trans Image Process 28:2439–2450
Gera D, Balasubramanian S, Jami A (2022) Cern: Compact facial expression recognition net. Pattern Recognit Lett 155:9–18
Zeng J, Shan S, Chen X (2018) Facial expression recognition with inconsistently annotated datasets. In: Proceedings of the European conference on computer vision (ECCV), pp. 222–237
Cubuk ED, Zoph B, Shlens J, Le QV (2020) Randaugment: practical automated data augmentation with a reduced search space. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp. 702–703
Goodfellow IJ, Erhan D, Carrier PL, Courville A, Mirza M, Hamner B, Cukierski W, Tang Y, Thaler D, Lee D-H et al (2013) Challenges in representation learning: A report on three machine learning contests. In: International Conference on Neural Information Processing, pp. 117–124. Springer
Zhang K, Zhang Z, Li Z, Qiao Y (2016) Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process Lett 23(10):1499–1503. https://doi.org/10.1109/LSP.2016.2603342
Guo Y, Zhang L, Hu Y, He X, Gao J (2016) Ms-celeb-1m: A dataset and benchmark for large-scale face recognition. ECCV
Van der Maaten L, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9(11)
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision, pp. 618–626
Arnaud E, Dapogny A, Bailly K (2022) Thin: Throwable information networks and application for facial expression recognition in the wild. IEEE Trans Affect Comput
Mahmoudi MA, Chetouani A, Boufera F, Tabia H (2020) Kernelized dense layers for facial expression recognition. In: 2020 IEEE international conference on image processing (ICIP), pp. 2226–2230. IEEE
Liu P, Lin Y, Meng Z, Lu L, Deng W, Zhou JT, Yang Y (2021) Point adversarial self-mining: A simple method for facial expression recognition. IEEE Trans Cybern
Jiang P, Wan B, Wang Q, Wu J (2020) Fast and efficient facial expression recognition using a gabor convolutional network. IEEE Signal Process Lett 27:1954–1958
Huang Q, Huang C, Wang X, Jiang F (2021) Facial expression recognition with grid-wise attention and visual transformer. Inf Sci 580:35–54
Georgescu MI, Ionescu RT, Popescu M (2019) Local learning with deep and handcrafted features for facial expression recognition. IEEE Access 7:64827–64836
Siqueira H, Magg S, Wermter S (2020) Efficient facial feature learning with wide ensemble-based convolutional neural networks. In: Proceedings of the AAAI conference on artificial intelligence, vol. 34, pp. 5800–5809
Meng D, Peng X, Wang K, Qiao Y (2019) Frame attention networks for facial expression recognition in videos. In: 2019 IEEE international conference on image processing (ICIP), pp. 3866–3870. IEEE
Cai J, Meng Z, Khan AS, Li Z, O’Reilly J, Tong Y (2018) Island loss for learning discriminative features in facial expression recognition. In: 2018 13th IEEE international conference on automatic face & gesture recognition (FG 2018), pp. 302–309. IEEE
Hua W, Dai F, Huang L, Xiong J, Gui G (2019) Hero: Human emotions recognition for realizing intelligent internet of things. IEEE Access 7:24321–24332
Fu Y, Wu X, Li X, Pan Z, Luo D (2020) Semantic neighborhood-aware deep facial expression recognition. IEEE Trans Image Process 29:6535–6548
Chen Y, Wang J, Chen S, Shi Z, Cai J (2019) Facial motion prior networks for facial expression recognition. VCIP. IEEE
Acknowledgements
We dedicate this work to Bhagawan Sri Sathya Sai Baba, Divine Founder Chancellor of Sri Sathya Sai Institute of Higher Learning, PrasanthiNilyam, A.P., India.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
Authors have no competing interests.
Code availability
Source code is publicly available at https://github.com/1980x/DNFER.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Gera, D., Raj Kumar, B.V., Badveeti, N.S.K. et al. Dynamic adaptive threshold based learning for noisy annotations robust facial expression recognition. Multimed Tools Appl 83, 49537–49566 (2024). https://doi.org/10.1007/s11042-023-17510-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-17510-3