Skip to main content
Log in

HEU Emotion: a large-scale database for multimodal emotion recognition in the wild

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

The study of affective computing in the wild setting is underpinned by databases. Existing multimodal emotion databases in the real-world conditions are few and small, with a limited number of subjects and expressed in a single language. To meet this requirement, we collected, annotated, and prepared to release a new natural state video database (called HEU Emotion). HEU Emotion contains a total of 19,004 video clips, which is divided into two parts according to the data source. The first part contains videos downloaded from Tumblr, Google, and Giphy, including 10 emotions and two modalities (facial expression and body posture). The second part includes corpus taken manually from movies, TV series, and variety shows, consisting of 10 emotions and three modalities (facial expression, body posture, and emotional speech). HEU Emotion is by far the most extensive multimodal emotional database with 9951 subjects. In order to provide a benchmark for emotion recognition, we used many conventional machine learning and deep learning methods to evaluate HEU Emotion. We proposed a multimodal attention module to fuse multimodal features adaptively. After multimodal fusion, the recognition accuracies for the two parts increased by 2.19% and 4.01%, respectively, over those of single-modal facial expression recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. https://github.com/ShiqiYu/libfacedetection..

References

  1. Ahonen T, Hadid A, Pietikäinen M (2004) Face recognition with local binary patterns. In: European conference on computer vision, pp 469–481. Springer

  2. Ben-Younes H, Cadene R, Thome N, Cord M (2019) Block: Bilinear superdiagonal fusion for visual question answering and visual relationship detection. arXiv preprint arXiv:1902.00038

  3. Busso C, Bulut M, Lee CC, Kazemzadeh A, Mower E, Kim S, Chang JN, Lee S, Narayanan SS (2008) Iemocap: Interactive emotional dyadic motion capture database. Language resources and evaluation 42(4):335–339

    Article  Google Scholar 

  4. Chen SY, Hsu CC, Kuo CC, Ku LW, et al (2018) Emotionlines: An emotion corpus of multi-party conversations. arXiv preprint arXiv:1802.08379

  5. Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using rnn encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 1724–1734. Association for Computational Linguistics

  6. Chou HC, Lin WC, Chang LC, Li CC, Ma HP, Lee CC (2017) Nnime: The nthu-ntua chinese interactive multimodal emotion corpus. In: 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII), pp 292–298. IEEE

  7. Clavel C, Vasilescu I, Devillers L, Richard G, Ehrette T, Sedogbo C (2006) The safe corpus: illustrating extreme emotions in dynamic situations. First International Workshop on Emotion: Corpora for Research on Emotion and Affect (International conference on Language Resources and Evaluation (LREC 2006)). Genoa, Italy, pp 76–79

    Google Scholar 

  8. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), vol. 1, pp 886–893. IEEE

  9. De Silva LC, Miyasato T, Nakatsu R (1997) Facial emotion recognition using multi-modal information. In: Proceedings of ICICS, 1997 International Conference on Information, Communications and Signal Processing. Theme: Trends in Information Systems Engineering and Wireless Multimedia Communications (Cat., vol. 1, pp 397–401. IEEE

  10. Dhall A, Goecke R, Lucey S, Gedeon T (2011) Static facial expression analysis in tough conditions: Data, evaluation protocol and benchmark. In: 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp 2106–2112. IEEE

  11. Dhall A, Goecke R, Lucey S, Gedeon T (2012) Collecting large, richly annotated facial-expression databases from movies. IEEE MultiMed 19(3):34–41. https://doi.org/10.1109/MMUL.2012.26

    Article  Google Scholar 

  12. Ekman P (1976) Pictures of facial affect. Consulting Psychologists Press,

    Google Scholar 

  13. Ekman P (1993) Facial expression and emotion. Am psychol 48(4):384

    Article  Google Scholar 

  14. Ekman P (2003) Emotions revealed: Recognizing faces and feelings to improve communication and emotional life. Holt Paperback 128(8):140–140

    Google Scholar 

  15. Eyben F, Weninger F, Gross F, Schuller B (2013) Recent developments in opensmile, the munich open-source multimedia feature extractor. In: Proceedings of the 21st ACM international conference on Multimedia, pp 835–838. ACM

  16. Fan Y, Lam JC, Li VO (2018) Multi-region ensemble convolutional neural network for facial expression recognition. In: International Conference on Artificial Neural Networks, pp. 84–94. Springer

  17. Fan Y, Lu X, Li D, Liu Y (2016) Video-based emotion recognition using cnn-rnn and c3d hybrid networks. In: Proceedings of the 18th ACM International Conference on Multimodal Interaction, pp. 445–450. ACM

  18. Georgescu MI, Ionescu RT, Popescu M (2019) Local learning with deep and handcrafted features for facial expression recognition. IEEE Access 7:64827–64836

    Article  Google Scholar 

  19. Goodfellow IJ, Erhan D, Carrier PL, Courville A, Mirza M, Hamner B, Cukierski W, Tang Y, Thaler D, Lee DH, et al (2013) Challenges in representation learning: A report on three machine learning contests. In: International Conference on Neural Information Processing, pp 117–124. Springer

  20. Gunes H, Piccardi M (2006) A bimodal face and body gesture database for automatic analysis of human nonverbal affective behavior. In: 18th International Conference on Pattern Recognition (ICPR’06), vol 1, pp 1148–1153. IEEE

  21. Haq S, Jackson PJ (2011) Multimodal emotion recognition. In: Machine audition: principles, algorithms and systems, pp 398–423. IGI Global

  22. Hara K, Kataoka H, Satoh Y (2018) Can spatiotemporal 3d cnns retrace the history of 2d cnns and imagenet? In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp 6546–6555

  23. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  24. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141

  25. Hu P, Cai D, Wang S, Yao A, Chen Y (2017) Learning supervised scoring ensemble for emotion recognition in the wild. In: Proceedings of the 19th ACM international conference on multimodal interaction, pp 553–560. ACM

  26. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708

  27. Kazemi V, Sullivan J (2014) One millisecond face alignment with an ensemble of regression trees. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1867–1874

  28. Kosti R, Alvarez JM, Recasens A, Lapedriza A (2017) Emotion recognition in context. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1960–1968. 10.1109/CVPR.2017.212

  29. Levi G, Hassner T (2015) Emotion recognition in the wild via convolutional neural networks and mapped binary patterns. In: Proceedings of the 2015 ACM on international conference on multimodal interaction, pp 503–510. ACM

  30. Li S, Deng W, Du J (2017) Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2584–2593 10.1109/CVPR.2017.277

  31. Li Y, Tao J, Chao L, Bao W, Liu Y (2017) Cheavd: a chinese natural emotional audio-visual database. J Amb Intell Human Comput 8(6):913–924

    Article  Google Scholar 

  32. Li Y, Tao J, Schuller B, Shan S, Jiang D, Jia J (2018) Mec 2017: Multimodal emotion recognition challenge. In: 2018 First Asian Conference on Affective Computing and Intelligent Interaction (ACII Asia), pp 1–5. IEEE

  33. Liu C, Tang T, Lv K, Wang M (2018) Multi-feature based emotion recognition for video clips. In: Proceedings of the 2018 on International Conference on Multimodal Interaction, pp 630–634. ACM

  34. Liu C, Wechsler H (2002) Gabor feature based classification using the enhanced fisher linear discriminant model for face recognition. IEEE Trans Image process 11(4):467–476

    Article  Google Scholar 

  35. Livingstone SR, Russo FA (2018) The ryerson audio-visual database of emotional speech and song (ravdess): A dynamic, multimodal set of facial and vocal expressions in north american english. PloS one 13(5):e0196391

    Article  Google Scholar 

  36. Lucey P, Cohn JF, Kanade T, Saragih J, Ambadar Z, Matthews I (2010) The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops, pp 94–101. IEEE

  37. Martin O, Kotsia I, Macq B, Pitas I (2006) The enterface’05 audio-visual emotion database. In: 22nd International Conference on Data Engineering Workshops (ICDEW’06), pp 8–8. IEEE

  38. Matsumoto D, Hwang HS (2011) Evidence for training the ability to read microexpressions of emotion. Motivation Emotion 35(2):181–191

    Article  Google Scholar 

  39. McGurk H, MacDonald J (1976) Hearing lips and seeing voices. Nature 264(5588):746

    Article  Google Scholar 

  40. McKeown G, Valstar M, Cowie R, Pantic M, Schroder M (2011) The semaine database: annotated multimodal records of emotionally colored conversations between a person and a limited agent. IEEE Trans Affect Comput 3(1):5–17

    Article  Google Scholar 

  41. Mehrabian A (2008) Communication without words. Communication theory, pp 193–200

  42. Mollahosseini A, Hasani B, Mahoor MH (2019) Affectnet: A database for facial expression, valence, and arousal computing in the wild. IEEE Trans Affective Comput 10(1):18–31. https://doi.org/10.1109/TAFFC.2017.2740923

    Article  Google Scholar 

  43. Ng HW, Nguyen VD, Vonikakis V, Winkler S (2015) Deep learning for emotion recognition on small datasets using transfer learning. In: Proceedings of the 2015 ACM on international conference on multimodal interaction, pp 443–449. ACM

  44. Peng X, Xia Z, Li L, Feng X (2016) Towards facial expression recognition in the wild: A new database and deep recognition system. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 93–99

  45. Perepelkina O, Kazimirova E, Konstantinova M (2018) Ramas: Russian multimodal corpus of dyadic interaction for affective computing. In: International Conference on Speech and Computer, pp 501–510. Springer

  46. Poria S, Hazarika D, Majumder N, Naik G, Cambria E, Mihalcea R (2018) Meld: A multimodal multi-party dataset for emotion recognition in conversations. arXiv preprint arXiv:1810.02508

  47. Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767

  48. Rothe R, Timofte R, Van Gool L (2015) Dex: Deep expectation of apparent age from a single image. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp 10–15

  49. Sapiński T, Kamińska D, Pelikant A, Ozcinar C, Avots E, Anbarjafari G (2018) Multimodal database of emotional speech, video and gestures. In: International Conference on Pattern Recognition, pp 153–163. Springer

  50. Schuller B, Steidl S, Batliner A (2009) The interspeech 2009 emotion challenge. In: Tenth Annual Conference of the International Speech Communication Association

  51. Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: Advances in neural information processing systems, pp 568–576

  52. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  53. Suykens JA (2001) Support vector machines: a nonlinear modelling and control perspective. Eur J Control 7(2–3):311–327

    Article  Google Scholar 

  54. Vasilescu I, Devillers L, Clavel C, Ehrette T (2004) Fiction database for emotion detection in abnormal situations. In: Eighth International Conference on Spoken Language Processing

  55. Xie Z (2010) Ryerson multimedia research laboratory (rml). http://www.rml.ryerson.ca/rml-emotion-database.html

  56. Yan J, Zheng W, Cui Z, Tang C, Zhang T, Zong Y (2018) Multi-cue fusion for emotion recognition in the wild. Neurocomputing 309:27–35

    Article  Google Scholar 

  57. Yu F, Chang E, Xu YQ, Shum HY (2001) Emotion detection from speech to enrich multimedia content. In: Pacific-Rim Conference on Multimedia, pp 550–557. Springer

  58. Zhang K, Zhang Z, Li Z, Qiao Y (2016) Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process Lett 23(10):1499–1503

    Article  Google Scholar 

  59. Zhu Y, Lan Z, Newsam S, Hauptmann A (2018) Hidden two-stream convolutional networks for action recognition. In: Asian Conference on Computer Vision, pp 363–378. Springer

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kejun Wang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, J., Wang, C., Wang, K. et al. HEU Emotion: a large-scale database for multimodal emotion recognition in the wild. Neural Comput & Applic 33, 8669–8685 (2021). https://doi.org/10.1007/s00521-020-05616-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-020-05616-w

Keywords

Navigation