ABSTRACT
Continuous emotion recognition is of great significance in affective computing and human-computer interaction. Most of existing methods for video based continuous emotion recognition utilize facial expression. However, besides facial expression, other clues including head pose and eye gaze are also closely related to human emotion, but have not been well explored in continuous emotion recognition task. On the one hand, head pose and eye gaze could result in different degrees of credibility of facial expression features. On the other hand, head pose and eye gaze carry emotional clues themselves, which are complementary to facial expression. Accordingly, in this paper we propose two ways to incorporate these two clues into continuous emotion recognition. They are respectively an attention mechanism based on head pose and eye gaze clues to guide the utilization of facial features in continuous emotion recognition, and an auxiliary line which helps extract more useful emotion information from head pose and eye gaze. Experiments are conducted on the Recola dataset, a database for continuous emotion recognition, and the results show that our framework outperforms other state-of-the-art methods due to the full use of head pose and eye gaze clues in addition to facial expression for continuous emotion recognition.
- Andra Adams, Marwa Mahmoud, Tadas Baltrušaitis, and Peter Robinson. 2015. Decoupling facial expressions and head motions in complex emotions. In Proceedings of International Conference on Affective Computing and Intelligent Interaction. 274–280.Google ScholarDigital Library
- Mohammadreza Amirian, Markus Kächele, Patrick Thiam, Viktor Kessler, and Friedhelm Schwenker. 2016. Continuous multimodal human affect estimation using echo state networks. In Proceedings of ACM International Audio/Visual Emotion Challenge and Workshop. 67–74.Google ScholarDigital Library
- Shaojie Bai, J Zico Kolter, and Vladlen Koltun. 2018. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271(2018).Google Scholar
- Tadas Baltrusaitis, Amir Zadeh, Yao Chong Lim, and Louis-Philippe Morency. 2018. Openface 2.0: Facial behavior analysis toolkit. In Proceedings of IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018). 59–66.Google ScholarDigital Library
- Kevin Brady, Youngjune Gwon, Pooya Khorrami, Elizabeth Godoy, William Campbell, Charlie Dagli, and Thomas S Huang. 2016. Multi-modal audio, video and physiological sensor learning for continuous emotion prediction. In PProceedings of ACM International Audio/Visual Emotion Challenge and Workshop. 97–104.Google ScholarDigital Library
- Cynthia Breazeal. 2003. Emotion and sociable humanoid robots. International Journal of Human-Computer Studies 59, 1-2 (2003), 119–155.Google ScholarDigital Library
- Linlin Chao, Jianhua Tao, Minghao Yang, Ya Li, and Zhengqi Wen. 2015. Long short term memory recurrent neural network based multimodal dimensional emotion recognition. In Proceedings of ACM International Audio/Visual Emotion Challenge and Workshop. 65–72.Google ScholarDigital Library
- Yu-Wei Chao, Sudheendra Vijayanarasimhan, Bryan Seybold, David A Ross, Jia Deng, and Rahul Sukthankar. 2018. Rethinking the faster r-cnn architecture for temporal action localization. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 1130–1139.Google ScholarCross Ref
- Shizhe Chen, Qin Jin, Jinming Zhao, and Shuai Wang. 2017. Multimodal multi-task learning for dimensional and continuous emotion recognition. In Proceedings of ACM International Audio/Visual Emotion Challenge and Workshop. 19–26.Google ScholarDigital Library
- Ellen Douglas-Cowie, Roddy Cowie, Ian Sneddon, Cate Cox, Orla Lowry, Margaret Mcrorie, Jean-Claude Martin, Laurence Devillers, Sarkis Abrilian, Anton Batliner, 2007. The HUMAINE database: Addressing the collection and annotation of naturalistic and induced emotional data. In Proceedings of International Conference on Affective Computing and Intelligent Interaction. 488–500.Google ScholarDigital Library
- Johnny RJ Fontaine, Klaus R Scherer, Etienne B Roesch, and Phoebe C Ellsworth. 2007. The world of emotions is not two-dimensional. Psychological Science 18, 12 (2007), 1050–1057.Google ScholarCross Ref
- Ian J Goodfellow, Dumitru Erhan, Pierre Luc Carrier, Aaron Courville, Mehdi Mirza, Ben Hamner, Will Cukierski, Yichuan Tang, David Thaler, Dong-Hyun Lee, 2013. Challenges in representation learning: A report on three machine learning contests. In Proceedings of International Conference on Neural Information Processing. 117–124.Google ScholarCross Ref
- Hatice Gunes, Mihalis A Nicolaou, and Maja Pantic. 2011. Continuous analysis of affect from voice and face. In Proceedings of Computer Analysis of Human Behavior. 255–291.Google ScholarCross Ref
- Hatice Gunes and Maja Pantic. 2010. Dimensional emotion prediction from spontaneous head gestures for interaction with sensitive artificial listeners. In Proceedings of International Conference on Intelligent Virtual Agents. 371–377.Google ScholarCross Ref
- Lang He, Dongmei Jiang, Le Yang, Ercheng Pei, Peng Wu, and Hichem Sahli. 2015. Multimodal affective dimension prediction using deep bidirectional long short-term memory recurrent neural networks. In Proceedings of ACM International Audio/Visual Emotion Challenge and Workshop. 73–80.Google ScholarDigital Library
- Matthias Holschneider, Richard Kronland-Martinet, Jean Morlet, and Ph Tchamitchian. 1990. A real-time algorithm for signal analysis with the help of the wavelet transform. In Wavelets. 286–297.Google Scholar
- Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 7132–7141.Google ScholarCross Ref
- Soheil Khorram, Zakaria Aldeneh, Dimitrios Dimitriadis, Melvin McInnis, and Emily Mower Provost. 2017. Capturing long-term temporal dependencies with convolutional networks for continuous emotion recognition. arXiv preprint arXiv:1708.07050(2017).Google Scholar
- Pooya Khorrami, Tom Le Paine, Kevin Brady, Charlie Dagli, and Thomas S Huang. 2016. How deep neural networks can improve emotion recognition on video data. In Proceedings of IEEE International Conference on Image Processing. 619–623.Google ScholarCross Ref
- Colin Lea, Michael D Flynn, Rene Vidal, Austin Reiter, and Gregory D Hager. 2017. Temporal convolutional networks for action segmentation and detection. In proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 156–165.Google ScholarCross Ref
- Jiyoung Lee, Sunok Kim, Seungryong Kiim, and Kwanghoon Sohn. 2018. Spatiotemporal Attention Based Deep Neural Networks for Emotion Recognition. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing. 1513–1517.Google ScholarDigital Library
- Qirong Mao, Qiyu Rao, Yongbin Yu, and Ming Dong. 2017. Hierarchical bayesian theme models for multipose facial expression recognition. IEEE Transactions on Multimedia 19, 4 (2017), 861–873.Google ScholarDigital Library
- Jérémie Nicolle, Vincent Rapp, Kévin Bailly, Lionel Prevost, and Mohamed Chetouani. 2012. Robust continuous prediction of human emotions using multiscale dynamic cues. In Proceedings of ACM International Conference on Multimodal Interaction. 501–508.Google ScholarDigital Library
- Fabien Ringeval, Björn Schuller, Michel Valstar, Roddy Cowie, Heysem Kaya, Maximilian Schmitt, Shahin Amiriparian, Nicholas Cummins, Denis Lalanne, Adrien Michaud, 2018. AVEC 2018 workshop and challenge: Bipolar disorder and cross-cultural affect recognition. In Proceedings of ACM International Audio/Visual Emotion Challenge and Workshop. 3–13.Google ScholarDigital Library
- Fabien Ringeval, Björn Schuller, Michel Valstar, Jonathan Gratch, Roddy Cowie, Stefan Scherer, Sharon Mozgai, Nicholas Cummins, Maximilian Schmitt, and Maja Pantic. 2017. Avec 2017: Real-life depression, and affect recognition workshop and challenge. In Proceedings of ACM Audio/Visual Emotion Challenge and Workshop. 3–9.Google ScholarDigital Library
- Fabien Ringeval, Andreas Sonderegger, Juergen Sauer, and Denis Lalanne. 2013. Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions. In Proceedings of IEEE International Conference and Workshops on Automatic Face and Gesture Recognition. 1–8.Google ScholarCross Ref
- Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, 2015. Imagenet large scale visual recognition challenge. International Journal of Computer Vision 115, 3 (2015), 211–252.Google ScholarDigital Library
- Evangelos Sariyanidi, Hatice Gunes, and Andrea Cavallaro. 2015. Automatic analysis of facial affect: A survey of registration, representation, and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 37, 6(2015), 1113–1133.Google ScholarDigital Library
- Bo Sun, Siming Cao, Liandong Li, Jun He, and Lejun Yu. 2016. Exploring multimodal visual features for continuous affect recognition. In Proceedings of ACM International Audio/Visual Emotion Challenge and Workshop. 83–88.Google ScholarDigital Library
- Jonathan Tompson, Ross Goroshin, Arjun Jain, Yann LeCun, and Christoph Bregler. 2015. Efficient object localization using convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 648–656.Google ScholarCross Ref
- Hugo Toscano, Thomas W Schubert, and Steffen R Giessner. 2018. Eye Gaze and Head Posture Jointly Influence Judgments of Dominance, Physical Strength, and Anger. Journal of Nonverbal Behavior 42, 3 (2018), 285–309.Google ScholarCross Ref
- George Trigeorgis, Fabien Ringeval, Raymond Brueckner, Erik Marchi, Mihalis A Nicolaou, Björn Schuller, and Stefanos Zafeiriou. 2016. Adieu features? end-to-end speech emotion recognition using a deep convolutional recurrent network. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing. 5200–5204.Google ScholarDigital Library
- Michel F Valstar, Enrique Sánchez-Lozano, Jeffrey F Cohn, László A Jeni, Jeffrey M Girard, Zheng Zhang, Lijun Yin, and Maja Pantic. 2017. Fera 2017-addressing head pose in the third facial expression recognition and analysis challenge. In Proceedings of IEEE International Conference on Automatic Face & Gesture Recognition. 839–847.Google ScholarDigital Library
- Laurens Van Der Maaten. 2012. Audio-visual emotion challenge 2012: a simple approach. In Proceedings of ACM International Conference on Multimodal Interaction. 473–476.Google ScholarDigital Library
- Martin Wöllmer, Florian Eyben, Stephan Reiter, Björn Schuller, Cate Cox, Ellen Douglas-Cowie, and Roddy Cowie. 2008. Abandoning emotion classes-towards continuous emotion recognition with modelling of long-range dependencies. In Proceedings of Interspeech. 597–600.Google ScholarCross Ref
- Matthew D Zeiler. 2012. ADADELTA: an adaptive learning rate method. arXiv preprint arXiv:1212.5701(2012).Google Scholar
- Jinming Zhao, Ruichen Li, Shizhe Chen, and Qin Jin. 2018. Multi-modal Multi-cultural Dimensional Continues Emotion Recognition in Dyadic Interactions. In Proceedings of ACM International Audio/Visual Emotion Challenge and Workshop. 65–72.Google ScholarDigital Library
Recommendations
The role of trait anxiety in the interaction between eye gaze and facial expressions
ICNC'09: Proceedings of the 5th international conference on Natural computationPrevious research revealed an interaction between eye gaze and emotional facial expressions. There are also evidences that elevated levels of trait anxiety are associated with an increased ability to accurately recognize facial expressions of fear, and ...
The Role of Trait Anxiety in the Interaction between Eye Gaze and Facial Expressions
ICNC '09: Proceedings of the 2009 Fifth International Conference on Natural Computation - Volume 01Previous research revealed an interaction between eye gaze and emotional facial expressions. There are also evidences that elevated levels of trait anxiety are associated with an increased ability to accurately recognize facial expressions of fear, and ...
Human-Computer Interaction Using Emotion Recognition from Facial Expression
EMS '11: Proceedings of the 2011 UKSim 5th European Symposium on Computer Modeling and SimulationThis paper describes emotion recognition system based on facial expression. A fully automatic facial expression recognition system is based on three steps: face detection, facial characteristic extraction and facial expression classification. We have ...
Comments