Skip to main content
Log in

Dynamic adaptive threshold based learning for noisy annotations robust facial expression recognition

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The real-world facial expression recognition (FER) datasets suffer from noisy annotations due to crowd-sourcing, ambiguity in expressions, the subjectivity of annotators, and inter-class similarity. However, the recent deep networks have a strong capacity to memorize the noisy annotations leading to corrupted feature embedding and poor generalization. Recent works handle the problem by selecting samples with clean labels based on loss values using a fixed threshold for all the classes which may not always be reliable. They also depend upon the noise rate in the data which may not always be available. In this work, we propose a novel FER framework (DNFER) in which samples with clean labels are selected based on a class-specific threshold, computed dynamically in each mini-batch. Specifically, DNFER uses supervised training on selected clean samples and unsupervised consistent training on all the samples. This threshold is independent of noise rate and does not need any clean data, unlike other methods. In addition, to effectively learn from noisy annotated samples, the posterior distributions between weakly-augmented image and strongly-augmented image are aligned using an unsupervised consistency loss. We demonstrate the robustness of DNFER on both synthetic as well as on real noisy annotated FER datasets. In addition, DNFER obtains state-of-the-art performance on popular benchmark datasets, with 90.41% on RAFDB, 57.77 % on SFEW, 89.32% on FERPlus, and 65.22% on AffectNet-7. Our source codes are made publicly available at https://github.com/1980x/DNFER.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Algorithm 1
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Notes

  1. https://github.com/ufoym/imbalanced-dataset-sampler

References

  1. Liu Z, Peng Y, Hu W (2020) Driver fatigue detection based on deeply-learned facial expression representation. J Vis Commun Image Represent 71:102723

    Article  Google Scholar 

  2. Bisogni C, Castiglione A, Hossain S, Narducci F, Umer S (2022) Impact of deep learning approaches on facial expression recognition in healthcare industries. IEEE Trans Ind Inform 18(8):5619–5627. https://doi.org/10.1109/TII.2022.3141400

    Article  Google Scholar 

  3. Maqableh W, Alzyoud FY, Zraqou J (2023) The use of facial expressions in measuring students’ interaction with distance learning environments during the covid-19 crisis. Vis Inform 7(1):1–17

    Article  Google Scholar 

  4. Yan L, Sheng M, Wang C, Gao R, Yu H (2022) Hybrid neural networks based facial expression recognition for smart city. Multimed Tools Appl 1–24

  5. Abate AF, Bisogni C, Cascone L, Castiglione A, Costabile G, Mercuri I (2020) Social robot interactions for social engineering: opportunities and open issues. In: 2020 IEEE intl conf on dependable, autonomic and secure computing, intl conf on pervasive intelligence and computing, intl conf on cloud and big data computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech), pp. 539–547. https://doi.org/10.1109/DASC-PICom-CBDCom-CyberSciTech49142.2020.00097

  6. Ding H, Zhou P, Chellappa R (2020) Occlusion-adaptive deep network for robust facial expression recognition. In: 2020 IEEE international joint conference on biometrics (IJCB), pp. 1–9. IEEE

  7. Gera D, Balasubramanian S (2021) Landmark guidance independent spatio channel attention and complementary context information based facial expression recognition. Pattern Recognit Lett 145:58–66

    Article  Google Scholar 

  8. Mao S, Shi G, Gou S, Yan D, Jiao L, Xiong L (2022) Adaptively lighting up facial expression crucial regions via local non-local joint network. arXiv:2203.14045

  9. Wang K, Peng X, Yang J, Meng D, Qiao Y (2020) Region attention networks for pose and occlusion robust facial expression recognition. IEEE Trans Image Process 29:4057–4069

    Article  Google Scholar 

  10. Ruan D, Yan Y, Lai S, Chai Z, Shen C, Wang H (2021) Feature decomposition and reconstruction learning for effective facial expression recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7660–7669

  11. Zhao Z, Liu Q, Zhou F (2021) Robust lightweight facial expression recognition network with label distribution training. In: Proceedings of the AAAI conference on artificial intelligence, vol. 35, pp. 3510–3519

  12. Li S, Deng W (2019) Reliable crowdsourcing and deep locality-preserving learning for unconstrained facial expression recognition. IEEE Trans Image Process 28(1):356–370

    Article  MathSciNet  Google Scholar 

  13. Li S, Deng W, Du J (2017) Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp. 2584–2593. IEEE

  14. Barsoum E, Zhang C, Ferrer CC, Zhang Z (2016) Training deep networks for facial expression recognition with crowdsourced label distribution. In Proceedings of the 18th ACM international conference on multimodal interaction, 279–283

  15. Mollahosseini A, Hasani B, Mahoor MH (2017) Affectnet: A database for facial expression, valence, and arousal computing in the wild. IEEE Trans Affect Comput

  16. Dhall A, Goecke R, Lucey S, Gedeon T (2011) Static facial expression analysis in tough conditions: Data, evaluation protocol and benchmark. In: 2011 IEEE international conference on computer vision workshops (ICCV Workshops), pp. 2106–2112. IEEE

  17. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778

  18. Arpit D, Jastrzkebski S, Ballas N, Krueger D, Bengio E, Kanwal MS, Maharaj T, Fischer A, Courville A, Bengio Y et al (2017) A closer look at memorization in deep networks. In: International conference on machine learning, pp. 233–242. PMLR

  19. Zhang C, Bengio S, Hardt M, Recht B, Vinyals O (2021) Understanding deep learning (still) requires rethinking generalization. Commun ACM 64(3):107–115

    Article  Google Scholar 

  20. Wang K, Peng X, Yang J, Lu S, Qiao Y (2020) Suppressing uncertainties for large-scale facial expression recognition. In: CVPR, pp. 6897–6906

  21. Gera D, Vikas G, Balasubramanian S (2021) Handling ambiguous annotations for facial expression recognition in the wild. In: Proceedings of the twelfth Indian conference on computer vision, graphics and image processing, pp. 1–9

  22. Liu Y, Zhang X, Kauttonen J, Zhao G (2022) Uncertain label correction via auxiliary action unit graphs for facial expression recognition. 777–783. https://doi.org/10.1109/ICPR56361.2022.9956650. IEEE

  23. Mao S, Shi G, Jiao L, Gou S, Li Y, Xiong L, Shi B (2021) Label distribution amendment with emotional semantic correlations for facial expression recognition. arXiv:2107.11061

  24. She J, Hu Y, Shi H, Wang J, Shen Q, Mei T (2021) Dive into ambiguity: Latent distribution mining and pairwise uncertainty estimation for facial expression recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 6248–6257

  25. Gera D, Balasubramanian S (2021) Noisy annotations robust consensual collaborative affect expression recognition. In: 2021 IEEE/CVF international conference on computer vision workshops (ICCVW), pp. 3578–3585. https://doi.org/10.1109/ICCVW54120.2021.00399

  26. Fan X, Deng Z, Wang K, Peng X, Qiao Y (2020) Learning discriminative representation for facial expression recognition from uncertainties. In: 2020 IEEE international conference on image processing (ICIP), pp. 903–907. IEEE

  27. Zhang Y, Wang C, Deng W (2021) Relative uncertainty learning for facial expression recognition. Adv Neural Inf Process Syst 34:17616–17627

    Google Scholar 

  28. Yu X, Han B, Yao J, Niu G, Tsang I, Sugiyama M (2019) How does disagreement help generalization against label corruption? In: International conference on machine learning, pp. 7164–7173. PMLR

  29. Wei H, Feng L, Chen X, An B (2020) Combating noisy labels by agreement: A joint training method with co-regularization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 13726–13735

  30. Sarfraz F, Arani E, Zonooz B (2021) Noisy concurrent training for efficient learning under label noise. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp. 3159–3168

  31. Jiang L, Zhou Z, Leung T, Li LJ, Fei Fei L (2018) Mentornet: Learning data-driven curriculum for very deep neural networks on corrupted labels. In: International conference on machine learning, pp. 2304–2313. PMLR

  32. Han B, Yao Q, Yu X, Niu G, Xu M, Hu W, Tsang IW, Sugiyama M (2018) Co-teaching: robust training of deep neural networks with extremely noisy labels. In: Proceedings of the 32nd international conference on neural information processing systems, pp. 8536–8546

  33. Yuan B, Chen J, Zhang W, Tai HS, McMains S (2018) Iterative cross learning on noisy labels. In: 2018 IEEE winter conference on applications of computer vision (WACV), pp. 757–765. IEEE

  34. Ren M, Zeng W, Yang B, Urtasun R (2018) Learning to reweight examples for robust deep learning. In: International conference on machine learning, pp. 4334–4343. PMLR

  35. Wang X, Hua Y, Kodirov E, Robertson NM (2019) Imae for noise-robust learning: Mean absolute error does not treat examples equally and gradient magnitude’s variance matters. arXiv:1903.12141

  36. Wang Y, Ma X, Chen Z, Luo Y, Yi J, Bailey J (2019) Symmetric cross entropy for robust learning with noisy labels. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 322–330

  37. Song H, Kim M, Park D, Shin Y, Lee J-G (2022) Learning from noisy labels with deep neural networks: A survey. IEEE Trans Neural Netw Learn Syst

  38. Malach E, Shalev-Shwartz S (2017) Decoupling” when to update” from” how to update”. Adv Neural Inf Process Syst 960–970

  39. Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Proceedings of the eleventh annual conference on computational learning theory, pp. 92–100

  40. Gera D, Balasubramanian S (2021) Co-curing noisy annotations for facial expression recognition. ICTACT J Image Vid Process (IJIVP). https://doi.org/10.21917/ijivp.2021.0356

  41. Ge H, Zhu Z, Dai Y, Wang B, Wu X (2022) Facial expression recognition based on deep learning. Comput Methods Programs Biomed 215:106621

    Article  Google Scholar 

  42. Sun Z, Zhang H, Bai J, Liu M, Hu Z (2023) A discriminatively deep fusion approach with improved conditional gan (im-cgan) for facial expression recognition. Pattern Recognit 135:109157

    Article  Google Scholar 

  43. Li Y, Zeng J, Shan S, Chen X (2019) Occlusion aware facial expression recognition using cnn with attention mechanism. IEEE Trans Image Process 28:2439–2450

    Article  MathSciNet  Google Scholar 

  44. Gera D, Balasubramanian S, Jami A (2022) Cern: Compact facial expression recognition net. Pattern Recognit Lett 155:9–18

    Article  Google Scholar 

  45. Zeng J, Shan S, Chen X (2018) Facial expression recognition with inconsistently annotated datasets. In: Proceedings of the European conference on computer vision (ECCV), pp. 222–237

  46. Cubuk ED, Zoph B, Shlens J, Le QV (2020) Randaugment: practical automated data augmentation with a reduced search space. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp. 702–703

  47. Goodfellow IJ, Erhan D, Carrier PL, Courville A, Mirza M, Hamner B, Cukierski W, Tang Y, Thaler D, Lee D-H et al (2013) Challenges in representation learning: A report on three machine learning contests. In: International Conference on Neural Information Processing, pp. 117–124. Springer

  48. Zhang K, Zhang Z, Li Z, Qiao Y (2016) Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process Lett 23(10):1499–1503. https://doi.org/10.1109/LSP.2016.2603342

    Article  Google Scholar 

  49. Guo Y, Zhang L, Hu Y, He X, Gao J (2016) Ms-celeb-1m: A dataset and benchmark for large-scale face recognition. ECCV

  50. Van der Maaten L, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9(11)

  51. Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision, pp. 618–626

  52. Arnaud E, Dapogny A, Bailly K (2022) Thin: Throwable information networks and application for facial expression recognition in the wild. IEEE Trans Affect Comput

  53. Mahmoudi MA, Chetouani A, Boufera F, Tabia H (2020) Kernelized dense layers for facial expression recognition. In: 2020 IEEE international conference on image processing (ICIP), pp. 2226–2230. IEEE

  54. Liu P, Lin Y, Meng Z, Lu L, Deng W, Zhou JT, Yang Y (2021) Point adversarial self-mining: A simple method for facial expression recognition. IEEE Trans Cybern

  55. Jiang P, Wan B, Wang Q, Wu J (2020) Fast and efficient facial expression recognition using a gabor convolutional network. IEEE Signal Process Lett 27:1954–1958

    Article  Google Scholar 

  56. Huang Q, Huang C, Wang X, Jiang F (2021) Facial expression recognition with grid-wise attention and visual transformer. Inf Sci 580:35–54

    Article  MathSciNet  Google Scholar 

  57. Georgescu MI, Ionescu RT, Popescu M (2019) Local learning with deep and handcrafted features for facial expression recognition. IEEE Access 7:64827–64836

    Article  Google Scholar 

  58. Siqueira H, Magg S, Wermter S (2020) Efficient facial feature learning with wide ensemble-based convolutional neural networks. In: Proceedings of the AAAI conference on artificial intelligence, vol. 34, pp. 5800–5809

  59. Meng D, Peng X, Wang K, Qiao Y (2019) Frame attention networks for facial expression recognition in videos. In: 2019 IEEE international conference on image processing (ICIP), pp. 3866–3870. IEEE

  60. Cai J, Meng Z, Khan AS, Li Z, O’Reilly J, Tong Y (2018) Island loss for learning discriminative features in facial expression recognition. In: 2018 13th IEEE international conference on automatic face & gesture recognition (FG 2018), pp. 302–309. IEEE

  61. Hua W, Dai F, Huang L, Xiong J, Gui G (2019) Hero: Human emotions recognition for realizing intelligent internet of things. IEEE Access 7:24321–24332

    Article  Google Scholar 

  62. Fu Y, Wu X, Li X, Pan Z, Luo D (2020) Semantic neighborhood-aware deep facial expression recognition. IEEE Trans Image Process 29:6535–6548

    Article  Google Scholar 

  63. Chen Y, Wang J, Chen S, Shi Z, Cai J (2019) Facial motion prior networks for facial expression recognition. VCIP. IEEE

Download references

Acknowledgements

We dedicate this work to Bhagawan Sri Sathya Sai Baba, Divine Founder Chancellor of Sri Sathya Sai Institute of Higher Learning, PrasanthiNilyam, A.P., India.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Darshan Gera.

Ethics declarations

Competing interests

Authors have no competing interests.

Code availability

Source code is publicly available at https://github.com/1980x/DNFER.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gera, D., Raj Kumar, B.V., Badveeti, N.S.K. et al. Dynamic adaptive threshold based learning for noisy annotations robust facial expression recognition. Multimed Tools Appl 83, 49537–49566 (2024). https://doi.org/10.1007/s11042-023-17510-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-17510-3

Keywords

Navigation