HEU Emotion: a large-scale database for multimodal emotion recognition in the wild

Chen, Jing; Wang, Chenhui; Wang, Kejun; Yin, Chaoqun; Zhao, Cong; Xu, Tao; Zhang, Xinyi; Huang, Ziqiang; Liu, Meichen; Yang, Tao

doi:10.1007/s00521-020-05616-w

HEU Emotion: a large-scale database for multimodal emotion recognition in the wild

Original Article
Published: 04 January 2021

Volume 33, pages 8669–8685, (2021)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Jing Chen¹,
Chenhui Wang²,
Kejun Wang ORCID: orcid.org/0000-0002-2912-8994¹,
Chaoqun Yin¹,
Cong Zhao¹,
Tao Xu¹,
Xinyi Zhang¹,
Ziqiang Huang¹,
Meichen Liu¹ &
…
Tao Yang¹

1845 Accesses
25 Citations
2 Altmetric
Explore all metrics

Abstract

The study of affective computing in the wild setting is underpinned by databases. Existing multimodal emotion databases in the real-world conditions are few and small, with a limited number of subjects and expressed in a single language. To meet this requirement, we collected, annotated, and prepared to release a new natural state video database (called HEU Emotion). HEU Emotion contains a total of 19,004 video clips, which is divided into two parts according to the data source. The first part contains videos downloaded from Tumblr, Google, and Giphy, including 10 emotions and two modalities (facial expression and body posture). The second part includes corpus taken manually from movies, TV series, and variety shows, consisting of 10 emotions and three modalities (facial expression, body posture, and emotional speech). HEU Emotion is by far the most extensive multimodal emotional database with 9951 subjects. In order to provide a benchmark for emotion recognition, we used many conventional machine learning and deep learning methods to evaluate HEU Emotion. We proposed a multimodal attention module to fuse multimodal features adaptively. After multimodal fusion, the recognition accuracies for the two parts increased by 2.19% and 4.01%, respectively, over those of single-modal facial expression recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multimodal Database of Emotional Speech, Video and Gestures

Emotion Recognition Using Multimodalities

MEC 2016: The Multimodal Emotion Recognition Challenge of CCPR 2016

Notes

https://github.com/ShiqiYu/libfacedetection..

References

Ahonen T, Hadid A, Pietikäinen M (2004) Face recognition with local binary patterns. In: European conference on computer vision, pp 469–481. Springer
Ben-Younes H, Cadene R, Thome N, Cord M (2019) Block: Bilinear superdiagonal fusion for visual question answering and visual relationship detection. arXiv preprint arXiv:1902.00038
Busso C, Bulut M, Lee CC, Kazemzadeh A, Mower E, Kim S, Chang JN, Lee S, Narayanan SS (2008) Iemocap: Interactive emotional dyadic motion capture database. Language resources and evaluation 42(4):335–339
Article Google Scholar
Chen SY, Hsu CC, Kuo CC, Ku LW, et al (2018) Emotionlines: An emotion corpus of multi-party conversations. arXiv preprint arXiv:1802.08379
Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using rnn encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 1724–1734. Association for Computational Linguistics
Chou HC, Lin WC, Chang LC, Li CC, Ma HP, Lee CC (2017) Nnime: The nthu-ntua chinese interactive multimodal emotion corpus. In: 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII), pp 292–298. IEEE
Clavel C, Vasilescu I, Devillers L, Richard G, Ehrette T, Sedogbo C (2006) The safe corpus: illustrating extreme emotions in dynamic situations. First International Workshop on Emotion: Corpora for Research on Emotion and Affect (International conference on Language Resources and Evaluation (LREC 2006)). Genoa, Italy, pp 76–79
Google Scholar
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), vol. 1, pp 886–893. IEEE
De Silva LC, Miyasato T, Nakatsu R (1997) Facial emotion recognition using multi-modal information. In: Proceedings of ICICS, 1997 International Conference on Information, Communications and Signal Processing. Theme: Trends in Information Systems Engineering and Wireless Multimedia Communications (Cat., vol. 1, pp 397–401. IEEE
Dhall A, Goecke R, Lucey S, Gedeon T (2011) Static facial expression analysis in tough conditions: Data, evaluation protocol and benchmark. In: 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp 2106–2112. IEEE
Dhall A, Goecke R, Lucey S, Gedeon T (2012) Collecting large, richly annotated facial-expression databases from movies. IEEE MultiMed 19(3):34–41. https://doi.org/10.1109/MMUL.2012.26
Article Google Scholar
Ekman P (1976) Pictures of facial affect. Consulting Psychologists Press,
Google Scholar
Ekman P (1993) Facial expression and emotion. Am psychol 48(4):384
Article Google Scholar
Ekman P (2003) Emotions revealed: Recognizing faces and feelings to improve communication and emotional life. Holt Paperback 128(8):140–140
Google Scholar
Eyben F, Weninger F, Gross F, Schuller B (2013) Recent developments in opensmile, the munich open-source multimedia feature extractor. In: Proceedings of the 21st ACM international conference on Multimedia, pp 835–838. ACM
Fan Y, Lam JC, Li VO (2018) Multi-region ensemble convolutional neural network for facial expression recognition. In: International Conference on Artificial Neural Networks, pp. 84–94. Springer
Fan Y, Lu X, Li D, Liu Y (2016) Video-based emotion recognition using cnn-rnn and c3d hybrid networks. In: Proceedings of the 18th ACM International Conference on Multimodal Interaction, pp. 445–450. ACM
Georgescu MI, Ionescu RT, Popescu M (2019) Local learning with deep and handcrafted features for facial expression recognition. IEEE Access 7:64827–64836
Article Google Scholar
Goodfellow IJ, Erhan D, Carrier PL, Courville A, Mirza M, Hamner B, Cukierski W, Tang Y, Thaler D, Lee DH, et al (2013) Challenges in representation learning: A report on three machine learning contests. In: International Conference on Neural Information Processing, pp 117–124. Springer
Gunes H, Piccardi M (2006) A bimodal face and body gesture database for automatic analysis of human nonverbal affective behavior. In: 18th International Conference on Pattern Recognition (ICPR’06), vol 1, pp 1148–1153. IEEE
Haq S, Jackson PJ (2011) Multimodal emotion recognition. In: Machine audition: principles, algorithms and systems, pp 398–423. IGI Global
Hara K, Kataoka H, Satoh Y (2018) Can spatiotemporal 3d cnns retrace the history of 2d cnns and imagenet? In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp 6546–6555
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
Hu P, Cai D, Wang S, Yao A, Chen Y (2017) Learning supervised scoring ensemble for emotion recognition in the wild. In: Proceedings of the 19th ACM international conference on multimodal interaction, pp 553–560. ACM
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708
Kazemi V, Sullivan J (2014) One millisecond face alignment with an ensemble of regression trees. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1867–1874
Kosti R, Alvarez JM, Recasens A, Lapedriza A (2017) Emotion recognition in context. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1960–1968. 10.1109/CVPR.2017.212
Levi G, Hassner T (2015) Emotion recognition in the wild via convolutional neural networks and mapped binary patterns. In: Proceedings of the 2015 ACM on international conference on multimodal interaction, pp 503–510. ACM
Li S, Deng W, Du J (2017) Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2584–2593 10.1109/CVPR.2017.277
Li Y, Tao J, Chao L, Bao W, Liu Y (2017) Cheavd: a chinese natural emotional audio-visual database. J Amb Intell Human Comput 8(6):913–924
Article Google Scholar
Li Y, Tao J, Schuller B, Shan S, Jiang D, Jia J (2018) Mec 2017: Multimodal emotion recognition challenge. In: 2018 First Asian Conference on Affective Computing and Intelligent Interaction (ACII Asia), pp 1–5. IEEE
Liu C, Tang T, Lv K, Wang M (2018) Multi-feature based emotion recognition for video clips. In: Proceedings of the 2018 on International Conference on Multimodal Interaction, pp 630–634. ACM
Liu C, Wechsler H (2002) Gabor feature based classification using the enhanced fisher linear discriminant model for face recognition. IEEE Trans Image process 11(4):467–476
Article Google Scholar
Livingstone SR, Russo FA (2018) The ryerson audio-visual database of emotional speech and song (ravdess): A dynamic, multimodal set of facial and vocal expressions in north american english. PloS one 13(5):e0196391
Article Google Scholar
Lucey P, Cohn JF, Kanade T, Saragih J, Ambadar Z, Matthews I (2010) The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops, pp 94–101. IEEE
Martin O, Kotsia I, Macq B, Pitas I (2006) The enterface’05 audio-visual emotion database. In: 22nd International Conference on Data Engineering Workshops (ICDEW’06), pp 8–8. IEEE
Matsumoto D, Hwang HS (2011) Evidence for training the ability to read microexpressions of emotion. Motivation Emotion 35(2):181–191
Article Google Scholar
McGurk H, MacDonald J (1976) Hearing lips and seeing voices. Nature 264(5588):746
Article Google Scholar
McKeown G, Valstar M, Cowie R, Pantic M, Schroder M (2011) The semaine database: annotated multimodal records of emotionally colored conversations between a person and a limited agent. IEEE Trans Affect Comput 3(1):5–17
Article Google Scholar
Mehrabian A (2008) Communication without words. Communication theory, pp 193–200
Mollahosseini A, Hasani B, Mahoor MH (2019) Affectnet: A database for facial expression, valence, and arousal computing in the wild. IEEE Trans Affective Comput 10(1):18–31. https://doi.org/10.1109/TAFFC.2017.2740923
Article Google Scholar
Ng HW, Nguyen VD, Vonikakis V, Winkler S (2015) Deep learning for emotion recognition on small datasets using transfer learning. In: Proceedings of the 2015 ACM on international conference on multimodal interaction, pp 443–449. ACM
Peng X, Xia Z, Li L, Feng X (2016) Towards facial expression recognition in the wild: A new database and deep recognition system. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 93–99
Perepelkina O, Kazimirova E, Konstantinova M (2018) Ramas: Russian multimodal corpus of dyadic interaction for affective computing. In: International Conference on Speech and Computer, pp 501–510. Springer
Poria S, Hazarika D, Majumder N, Naik G, Cambria E, Mihalcea R (2018) Meld: A multimodal multi-party dataset for emotion recognition in conversations. arXiv preprint arXiv:1810.02508
Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767
Rothe R, Timofte R, Van Gool L (2015) Dex: Deep expectation of apparent age from a single image. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp 10–15
Sapiński T, Kamińska D, Pelikant A, Ozcinar C, Avots E, Anbarjafari G (2018) Multimodal database of emotional speech, video and gestures. In: International Conference on Pattern Recognition, pp 153–163. Springer
Schuller B, Steidl S, Batliner A (2009) The interspeech 2009 emotion challenge. In: Tenth Annual Conference of the International Speech Communication Association
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: Advances in neural information processing systems, pp 568–576
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Suykens JA (2001) Support vector machines: a nonlinear modelling and control perspective. Eur J Control 7(2–3):311–327
Article Google Scholar
Vasilescu I, Devillers L, Clavel C, Ehrette T (2004) Fiction database for emotion detection in abnormal situations. In: Eighth International Conference on Spoken Language Processing
Xie Z (2010) Ryerson multimedia research laboratory (rml). http://www.rml.ryerson.ca/rml-emotion-database.html
Yan J, Zheng W, Cui Z, Tang C, Zhang T, Zong Y (2018) Multi-cue fusion for emotion recognition in the wild. Neurocomputing 309:27–35
Article Google Scholar
Yu F, Chang E, Xu YQ, Shum HY (2001) Emotion detection from speech to enrich multimedia content. In: Pacific-Rim Conference on Multimedia, pp 550–557. Springer
Zhang K, Zhang Z, Li Z, Qiao Y (2016) Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process Lett 23(10):1499–1503
Article Google Scholar
Zhu Y, Lan Z, Newsam S, Hauptmann A (2018) Hidden two-stream convolutional networks for action recognition. In: Asian Conference on Computer Vision, pp 363–378. Springer

Download references

Author information

Authors and Affiliations

College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin, China
Jing Chen, Kejun Wang, Chaoqun Yin, Cong Zhao, Tao Xu, Xinyi Zhang, Ziqiang Huang, Meichen Liu & Tao Yang
UCLA Department of Statistics, Los Angeles, CA, USA
Chenhui Wang

Authors

Jing Chen
View author publications
You can also search for this author in PubMed Google Scholar
Chenhui Wang
View author publications
You can also search for this author in PubMed Google Scholar
Kejun Wang
View author publications
You can also search for this author in PubMed Google Scholar
Chaoqun Yin
View author publications
You can also search for this author in PubMed Google Scholar
Cong Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Tao Xu
View author publications
You can also search for this author in PubMed Google Scholar
Xinyi Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Ziqiang Huang
View author publications
You can also search for this author in PubMed Google Scholar
Meichen Liu
View author publications
You can also search for this author in PubMed Google Scholar
Tao Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kejun Wang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, J., Wang, C., Wang, K. et al. HEU Emotion: a large-scale database for multimodal emotion recognition in the wild. Neural Comput & Applic 33, 8669–8685 (2021). https://doi.org/10.1007/s00521-020-05616-w

Download citation

Received: 06 August 2020
Accepted: 11 December 2020
Published: 04 January 2021
Issue Date: July 2021
DOI: https://doi.org/10.1007/s00521-020-05616-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

HEU Emotion: a large-scale database for multimodal emotion recognition in the wild

Abstract

Access this article

Similar content being viewed by others

Multimodal Database of Emotional Speech, Video and Gestures

Emotion Recognition Using Multimodalities

MEC 2016: The Multimodal Emotion Recognition Challenge of CCPR 2016

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

HEU Emotion: a large-scale database for multimodal emotion recognition in the wild

Abstract

Access this article

Similar content being viewed by others

Multimodal Database of Emotional Speech, Video and Gestures

Emotion Recognition Using Multimodalities

MEC 2016: The Multimodal Emotion Recognition Challenge of CCPR 2016

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation