research-article

Continuous Emotion Recognition in Videos by Fusing Facial Expression, Head Pose and Eye Gaze

Authors:
Suowei Wu

Beihang University, China

Beihang University, China
View Profile

,
Zhengyin Du

Beihang University, China

Beihang University, China
View Profile

,
Weixin Li

Beihang University, China

Beihang University, China
View Profile

,
Di Huang

Beihang University, China

Beihang University, China
View Profile

,
Yunhong Wang

Beihang University, China

Beihang University, China
View Profile

ICMI '19: 2019 International Conference on Multimodal InteractionOctober 2019Pages 40–48https://doi.org/10.1145/3340555.3353739

Published:14 October 2019Publication History

ICMI '19: 2019 International Conference on Multimodal Interaction

Pages 40–48

ABSTRACT

Continuous emotion recognition is of great significance in affective computing and human-computer interaction. Most of existing methods for video based continuous emotion recognition utilize facial expression. However, besides facial expression, other clues including head pose and eye gaze are also closely related to human emotion, but have not been well explored in continuous emotion recognition task. On the one hand, head pose and eye gaze could result in different degrees of credibility of facial expression features. On the other hand, head pose and eye gaze carry emotional clues themselves, which are complementary to facial expression. Accordingly, in this paper we propose two ways to incorporate these two clues into continuous emotion recognition. They are respectively an attention mechanism based on head pose and eye gaze clues to guide the utilization of facial features in continuous emotion recognition, and an auxiliary line which helps extract more useful emotion information from head pose and eye gaze. Experiments are conducted on the Recola dataset, a database for continuous emotion recognition, and the results show that our framework outperforms other state-of-the-art methods due to the full use of head pose and eye gaze clues in addition to facial expression for continuous emotion recognition.

References

Andra Adams, Marwa Mahmoud, Tadas Baltrušaitis, and Peter Robinson. 2015. Decoupling facial expressions and head motions in complex emotions. In Proceedings of International Conference on Affective Computing and Intelligent Interaction. 274–280.Google ScholarDigital Library
Mohammadreza Amirian, Markus Kächele, Patrick Thiam, Viktor Kessler, and Friedhelm Schwenker. 2016. Continuous multimodal human affect estimation using echo state networks. In Proceedings of ACM International Audio/Visual Emotion Challenge and Workshop. 67–74.Google ScholarDigital Library
Shaojie Bai, J Zico Kolter, and Vladlen Koltun. 2018. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271(2018).Google Scholar
Tadas Baltrusaitis, Amir Zadeh, Yao Chong Lim, and Louis-Philippe Morency. 2018. Openface 2.0: Facial behavior analysis toolkit. In Proceedings of IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018). 59–66.Google ScholarDigital Library
Kevin Brady, Youngjune Gwon, Pooya Khorrami, Elizabeth Godoy, William Campbell, Charlie Dagli, and Thomas S Huang. 2016. Multi-modal audio, video and physiological sensor learning for continuous emotion prediction. In PProceedings of ACM International Audio/Visual Emotion Challenge and Workshop. 97–104.Google ScholarDigital Library
Cynthia Breazeal. 2003. Emotion and sociable humanoid robots. International Journal of Human-Computer Studies 59, 1-2 (2003), 119–155.Google ScholarDigital Library
Linlin Chao, Jianhua Tao, Minghao Yang, Ya Li, and Zhengqi Wen. 2015. Long short term memory recurrent neural network based multimodal dimensional emotion recognition. In Proceedings of ACM International Audio/Visual Emotion Challenge and Workshop. 65–72.Google ScholarDigital Library
Yu-Wei Chao, Sudheendra Vijayanarasimhan, Bryan Seybold, David A Ross, Jia Deng, and Rahul Sukthankar. 2018. Rethinking the faster r-cnn architecture for temporal action localization. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 1130–1139.Google ScholarCross Ref
Shizhe Chen, Qin Jin, Jinming Zhao, and Shuai Wang. 2017. Multimodal multi-task learning for dimensional and continuous emotion recognition. In Proceedings of ACM International Audio/Visual Emotion Challenge and Workshop. 19–26.Google ScholarDigital Library
Ellen Douglas-Cowie, Roddy Cowie, Ian Sneddon, Cate Cox, Orla Lowry, Margaret Mcrorie, Jean-Claude Martin, Laurence Devillers, Sarkis Abrilian, Anton Batliner, 2007. The HUMAINE database: Addressing the collection and annotation of naturalistic and induced emotional data. In Proceedings of International Conference on Affective Computing and Intelligent Interaction. 488–500.Google ScholarDigital Library
Johnny RJ Fontaine, Klaus R Scherer, Etienne B Roesch, and Phoebe C Ellsworth. 2007. The world of emotions is not two-dimensional. Psychological Science 18, 12 (2007), 1050–1057.Google ScholarCross Ref
Ian J Goodfellow, Dumitru Erhan, Pierre Luc Carrier, Aaron Courville, Mehdi Mirza, Ben Hamner, Will Cukierski, Yichuan Tang, David Thaler, Dong-Hyun Lee, 2013. Challenges in representation learning: A report on three machine learning contests. In Proceedings of International Conference on Neural Information Processing. 117–124.Google ScholarCross Ref
Hatice Gunes, Mihalis A Nicolaou, and Maja Pantic. 2011. Continuous analysis of affect from voice and face. In Proceedings of Computer Analysis of Human Behavior. 255–291.Google ScholarCross Ref
Hatice Gunes and Maja Pantic. 2010. Dimensional emotion prediction from spontaneous head gestures for interaction with sensitive artificial listeners. In Proceedings of International Conference on Intelligent Virtual Agents. 371–377.Google ScholarCross Ref
Lang He, Dongmei Jiang, Le Yang, Ercheng Pei, Peng Wu, and Hichem Sahli. 2015. Multimodal affective dimension prediction using deep bidirectional long short-term memory recurrent neural networks. In Proceedings of ACM International Audio/Visual Emotion Challenge and Workshop. 73–80.Google ScholarDigital Library
Matthias Holschneider, Richard Kronland-Martinet, Jean Morlet, and Ph Tchamitchian. 1990. A real-time algorithm for signal analysis with the help of the wavelet transform. In Wavelets. 286–297.Google Scholar
Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 7132–7141.Google ScholarCross Ref
Soheil Khorram, Zakaria Aldeneh, Dimitrios Dimitriadis, Melvin McInnis, and Emily Mower Provost. 2017. Capturing long-term temporal dependencies with convolutional networks for continuous emotion recognition. arXiv preprint arXiv:1708.07050(2017).Google Scholar
Pooya Khorrami, Tom Le Paine, Kevin Brady, Charlie Dagli, and Thomas S Huang. 2016. How deep neural networks can improve emotion recognition on video data. In Proceedings of IEEE International Conference on Image Processing. 619–623.Google ScholarCross Ref
Colin Lea, Michael D Flynn, Rene Vidal, Austin Reiter, and Gregory D Hager. 2017. Temporal convolutional networks for action segmentation and detection. In proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 156–165.Google ScholarCross Ref
Jiyoung Lee, Sunok Kim, Seungryong Kiim, and Kwanghoon Sohn. 2018. Spatiotemporal Attention Based Deep Neural Networks for Emotion Recognition. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing. 1513–1517.Google ScholarDigital Library
Qirong Mao, Qiyu Rao, Yongbin Yu, and Ming Dong. 2017. Hierarchical bayesian theme models for multipose facial expression recognition. IEEE Transactions on Multimedia 19, 4 (2017), 861–873.Google ScholarDigital Library
Jérémie Nicolle, Vincent Rapp, Kévin Bailly, Lionel Prevost, and Mohamed Chetouani. 2012. Robust continuous prediction of human emotions using multiscale dynamic cues. In Proceedings of ACM International Conference on Multimodal Interaction. 501–508.Google ScholarDigital Library
Fabien Ringeval, Björn Schuller, Michel Valstar, Roddy Cowie, Heysem Kaya, Maximilian Schmitt, Shahin Amiriparian, Nicholas Cummins, Denis Lalanne, Adrien Michaud, 2018. AVEC 2018 workshop and challenge: Bipolar disorder and cross-cultural affect recognition. In Proceedings of ACM International Audio/Visual Emotion Challenge and Workshop. 3–13.Google ScholarDigital Library
Fabien Ringeval, Björn Schuller, Michel Valstar, Jonathan Gratch, Roddy Cowie, Stefan Scherer, Sharon Mozgai, Nicholas Cummins, Maximilian Schmitt, and Maja Pantic. 2017. Avec 2017: Real-life depression, and affect recognition workshop and challenge. In Proceedings of ACM Audio/Visual Emotion Challenge and Workshop. 3–9.Google ScholarDigital Library
Fabien Ringeval, Andreas Sonderegger, Juergen Sauer, and Denis Lalanne. 2013. Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions. In Proceedings of IEEE International Conference and Workshops on Automatic Face and Gesture Recognition. 1–8.Google ScholarCross Ref
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, 2015. Imagenet large scale visual recognition challenge. International Journal of Computer Vision 115, 3 (2015), 211–252.Google ScholarDigital Library
Evangelos Sariyanidi, Hatice Gunes, and Andrea Cavallaro. 2015. Automatic analysis of facial affect: A survey of registration, representation, and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 37, 6(2015), 1113–1133.Google ScholarDigital Library
Bo Sun, Siming Cao, Liandong Li, Jun He, and Lejun Yu. 2016. Exploring multimodal visual features for continuous affect recognition. In Proceedings of ACM International Audio/Visual Emotion Challenge and Workshop. 83–88.Google ScholarDigital Library
Jonathan Tompson, Ross Goroshin, Arjun Jain, Yann LeCun, and Christoph Bregler. 2015. Efficient object localization using convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 648–656.Google ScholarCross Ref
Hugo Toscano, Thomas W Schubert, and Steffen R Giessner. 2018. Eye Gaze and Head Posture Jointly Influence Judgments of Dominance, Physical Strength, and Anger. Journal of Nonverbal Behavior 42, 3 (2018), 285–309.Google ScholarCross Ref
George Trigeorgis, Fabien Ringeval, Raymond Brueckner, Erik Marchi, Mihalis A Nicolaou, Björn Schuller, and Stefanos Zafeiriou. 2016. Adieu features? end-to-end speech emotion recognition using a deep convolutional recurrent network. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing. 5200–5204.Google ScholarDigital Library
Michel F Valstar, Enrique Sánchez-Lozano, Jeffrey F Cohn, László A Jeni, Jeffrey M Girard, Zheng Zhang, Lijun Yin, and Maja Pantic. 2017. Fera 2017-addressing head pose in the third facial expression recognition and analysis challenge. In Proceedings of IEEE International Conference on Automatic Face & Gesture Recognition. 839–847.Google ScholarDigital Library
Laurens Van Der Maaten. 2012. Audio-visual emotion challenge 2012: a simple approach. In Proceedings of ACM International Conference on Multimodal Interaction. 473–476.Google ScholarDigital Library
Martin Wöllmer, Florian Eyben, Stephan Reiter, Björn Schuller, Cate Cox, Ellen Douglas-Cowie, and Roddy Cowie. 2008. Abandoning emotion classes-towards continuous emotion recognition with modelling of long-range dependencies. In Proceedings of Interspeech. 597–600.Google ScholarCross Ref
Matthew D Zeiler. 2012. ADADELTA: an adaptive learning rate method. arXiv preprint arXiv:1212.5701(2012).Google Scholar
Jinming Zhao, Ruichen Li, Shizhe Chen, and Qin Jin. 2018. Multi-modal Multi-cultural Dimensional Continues Emotion Recognition in Dyadic Interactions. In Proceedings of ACM International Audio/Visual Emotion Challenge and Workshop. 65–72.Google ScholarDigital Library

Recommendations

The role of trait anxiety in the interaction between eye gaze and facial expressions
ICNC'09: Proceedings of the 5th international conference on Natural computation

Previous research revealed an interaction between eye gaze and emotional facial expressions. There are also evidences that elevated levels of trait anxiety are associated with an increased ability to accurately recognize facial expressions of fear, and ...
Read More
The Role of Trait Anxiety in the Interaction between Eye Gaze and Facial Expressions
ICNC '09: Proceedings of the 2009 Fifth International Conference on Natural Computation - Volume 01

Previous research revealed an interaction between eye gaze and emotional facial expressions. There are also evidences that elevated levels of trait anxiety are associated with an increased ability to accurately recognize facial expressions of fear, and ...
Read More
Human-Computer Interaction Using Emotion Recognition from Facial Expression
EMS '11: Proceedings of the 2011 UKSim 5th European Symposium on Computer Modeling and Simulation

This paper describes emotion recognition system based on facial expression. A fully automatic facial expression recognition system is based on three steps: face detection, facial characteristic extraction and facial expression classification. We have ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICMI '19: 2019 International Conference on Multimodal Interaction
October 2019
601 pages
ISBN:9781450368605
DOI:10.1145/3340555
Editors:
Wen Gao
Peking University, China
,
Helen Mei Ling Meng
Chinese University of Hong Kong, China
,
Matthew Turk
Toyota Technological Institute at Chicago, USA
,
Susan R. Fussell
Cornell University, USA
,
Björn Schuller
Imperial College London / University of Augsburg, UK
,
Yale Song
Microsoft Research, USA
,
Kai Yu
Shanghai Jiao Tong University, China
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 October 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Attention
Continuous Emotion Recognition
Eye gaze
Facial Expression
Head Pose
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate453of1,080submissions,42%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 19
  Total Citations
  View Citations
- 891
  Total Downloads
- Downloads (Last 12 months)125
- Downloads (Last 6 weeks)13
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Continuous Emotion Recognition in Videos by Fusing Facial Expression, Head Pose and Eye Gaze

ICMI '19: 2019 International Conference on Multimodal Interaction

ABSTRACT

References

Cited By

Recommendations

The role of trait anxiety in the interaction between eye gaze and facial expressions

The Role of Trait Anxiety in the Interaction between Eye Gaze and Facial Expressions

Human-Computer Interaction Using Emotion Recognition from Facial Expression

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Continuous Emotion Recognition in Videos by Fusing Facial Expression, Head Pose and Eye Gaze

ICMI '19: 2019 International Conference on Multimodal Interaction

ABSTRACT

References

Cited By

Recommendations

The role of trait anxiety in the interaction between eye gaze and facial expressions

The Role of Trait Anxiety in the Interaction between Eye Gaze and Facial Expressions

Human-Computer Interaction Using Emotion Recognition from Facial Expression

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media