skip to main content
10.1145/3340555.3353739acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
research-article

Continuous Emotion Recognition in Videos by Fusing Facial Expression, Head Pose and Eye Gaze

Published:14 October 2019Publication History

ABSTRACT

Continuous emotion recognition is of great significance in affective computing and human-computer interaction. Most of existing methods for video based continuous emotion recognition utilize facial expression. However, besides facial expression, other clues including head pose and eye gaze are also closely related to human emotion, but have not been well explored in continuous emotion recognition task. On the one hand, head pose and eye gaze could result in different degrees of credibility of facial expression features. On the other hand, head pose and eye gaze carry emotional clues themselves, which are complementary to facial expression. Accordingly, in this paper we propose two ways to incorporate these two clues into continuous emotion recognition. They are respectively an attention mechanism based on head pose and eye gaze clues to guide the utilization of facial features in continuous emotion recognition, and an auxiliary line which helps extract more useful emotion information from head pose and eye gaze. Experiments are conducted on the Recola dataset, a database for continuous emotion recognition, and the results show that our framework outperforms other state-of-the-art methods due to the full use of head pose and eye gaze clues in addition to facial expression for continuous emotion recognition.

References

  1. Andra Adams, Marwa Mahmoud, Tadas Baltrušaitis, and Peter Robinson. 2015. Decoupling facial expressions and head motions in complex emotions. In Proceedings of International Conference on Affective Computing and Intelligent Interaction. 274–280.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Mohammadreza Amirian, Markus Kächele, Patrick Thiam, Viktor Kessler, and Friedhelm Schwenker. 2016. Continuous multimodal human affect estimation using echo state networks. In Proceedings of ACM International Audio/Visual Emotion Challenge and Workshop. 67–74.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Shaojie Bai, J Zico Kolter, and Vladlen Koltun. 2018. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271(2018).Google ScholarGoogle Scholar
  4. Tadas Baltrusaitis, Amir Zadeh, Yao Chong Lim, and Louis-Philippe Morency. 2018. Openface 2.0: Facial behavior analysis toolkit. In Proceedings of IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018). 59–66.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Kevin Brady, Youngjune Gwon, Pooya Khorrami, Elizabeth Godoy, William Campbell, Charlie Dagli, and Thomas S Huang. 2016. Multi-modal audio, video and physiological sensor learning for continuous emotion prediction. In PProceedings of ACM International Audio/Visual Emotion Challenge and Workshop. 97–104.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Cynthia Breazeal. 2003. Emotion and sociable humanoid robots. International Journal of Human-Computer Studies 59, 1-2 (2003), 119–155.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Linlin Chao, Jianhua Tao, Minghao Yang, Ya Li, and Zhengqi Wen. 2015. Long short term memory recurrent neural network based multimodal dimensional emotion recognition. In Proceedings of ACM International Audio/Visual Emotion Challenge and Workshop. 65–72.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Yu-Wei Chao, Sudheendra Vijayanarasimhan, Bryan Seybold, David A Ross, Jia Deng, and Rahul Sukthankar. 2018. Rethinking the faster r-cnn architecture for temporal action localization. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 1130–1139.Google ScholarGoogle ScholarCross RefCross Ref
  9. Shizhe Chen, Qin Jin, Jinming Zhao, and Shuai Wang. 2017. Multimodal multi-task learning for dimensional and continuous emotion recognition. In Proceedings of ACM International Audio/Visual Emotion Challenge and Workshop. 19–26.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Ellen Douglas-Cowie, Roddy Cowie, Ian Sneddon, Cate Cox, Orla Lowry, Margaret Mcrorie, Jean-Claude Martin, Laurence Devillers, Sarkis Abrilian, Anton Batliner, 2007. The HUMAINE database: Addressing the collection and annotation of naturalistic and induced emotional data. In Proceedings of International Conference on Affective Computing and Intelligent Interaction. 488–500.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Johnny RJ Fontaine, Klaus R Scherer, Etienne B Roesch, and Phoebe C Ellsworth. 2007. The world of emotions is not two-dimensional. Psychological Science 18, 12 (2007), 1050–1057.Google ScholarGoogle ScholarCross RefCross Ref
  12. Ian J Goodfellow, Dumitru Erhan, Pierre Luc Carrier, Aaron Courville, Mehdi Mirza, Ben Hamner, Will Cukierski, Yichuan Tang, David Thaler, Dong-Hyun Lee, 2013. Challenges in representation learning: A report on three machine learning contests. In Proceedings of International Conference on Neural Information Processing. 117–124.Google ScholarGoogle ScholarCross RefCross Ref
  13. Hatice Gunes, Mihalis A Nicolaou, and Maja Pantic. 2011. Continuous analysis of affect from voice and face. In Proceedings of Computer Analysis of Human Behavior. 255–291.Google ScholarGoogle ScholarCross RefCross Ref
  14. Hatice Gunes and Maja Pantic. 2010. Dimensional emotion prediction from spontaneous head gestures for interaction with sensitive artificial listeners. In Proceedings of International Conference on Intelligent Virtual Agents. 371–377.Google ScholarGoogle ScholarCross RefCross Ref
  15. Lang He, Dongmei Jiang, Le Yang, Ercheng Pei, Peng Wu, and Hichem Sahli. 2015. Multimodal affective dimension prediction using deep bidirectional long short-term memory recurrent neural networks. In Proceedings of ACM International Audio/Visual Emotion Challenge and Workshop. 73–80.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Matthias Holschneider, Richard Kronland-Martinet, Jean Morlet, and Ph Tchamitchian. 1990. A real-time algorithm for signal analysis with the help of the wavelet transform. In Wavelets. 286–297.Google ScholarGoogle Scholar
  17. Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 7132–7141.Google ScholarGoogle ScholarCross RefCross Ref
  18. Soheil Khorram, Zakaria Aldeneh, Dimitrios Dimitriadis, Melvin McInnis, and Emily Mower Provost. 2017. Capturing long-term temporal dependencies with convolutional networks for continuous emotion recognition. arXiv preprint arXiv:1708.07050(2017).Google ScholarGoogle Scholar
  19. Pooya Khorrami, Tom Le Paine, Kevin Brady, Charlie Dagli, and Thomas S Huang. 2016. How deep neural networks can improve emotion recognition on video data. In Proceedings of IEEE International Conference on Image Processing. 619–623.Google ScholarGoogle ScholarCross RefCross Ref
  20. Colin Lea, Michael D Flynn, Rene Vidal, Austin Reiter, and Gregory D Hager. 2017. Temporal convolutional networks for action segmentation and detection. In proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 156–165.Google ScholarGoogle ScholarCross RefCross Ref
  21. Jiyoung Lee, Sunok Kim, Seungryong Kiim, and Kwanghoon Sohn. 2018. Spatiotemporal Attention Based Deep Neural Networks for Emotion Recognition. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing. 1513–1517.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Qirong Mao, Qiyu Rao, Yongbin Yu, and Ming Dong. 2017. Hierarchical bayesian theme models for multipose facial expression recognition. IEEE Transactions on Multimedia 19, 4 (2017), 861–873.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Jérémie Nicolle, Vincent Rapp, Kévin Bailly, Lionel Prevost, and Mohamed Chetouani. 2012. Robust continuous prediction of human emotions using multiscale dynamic cues. In Proceedings of ACM International Conference on Multimodal Interaction. 501–508.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Fabien Ringeval, Björn Schuller, Michel Valstar, Roddy Cowie, Heysem Kaya, Maximilian Schmitt, Shahin Amiriparian, Nicholas Cummins, Denis Lalanne, Adrien Michaud, 2018. AVEC 2018 workshop and challenge: Bipolar disorder and cross-cultural affect recognition. In Proceedings of ACM International Audio/Visual Emotion Challenge and Workshop. 3–13.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Fabien Ringeval, Björn Schuller, Michel Valstar, Jonathan Gratch, Roddy Cowie, Stefan Scherer, Sharon Mozgai, Nicholas Cummins, Maximilian Schmitt, and Maja Pantic. 2017. Avec 2017: Real-life depression, and affect recognition workshop and challenge. In Proceedings of ACM Audio/Visual Emotion Challenge and Workshop. 3–9.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Fabien Ringeval, Andreas Sonderegger, Juergen Sauer, and Denis Lalanne. 2013. Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions. In Proceedings of IEEE International Conference and Workshops on Automatic Face and Gesture Recognition. 1–8.Google ScholarGoogle ScholarCross RefCross Ref
  27. Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, 2015. Imagenet large scale visual recognition challenge. International Journal of Computer Vision 115, 3 (2015), 211–252.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Evangelos Sariyanidi, Hatice Gunes, and Andrea Cavallaro. 2015. Automatic analysis of facial affect: A survey of registration, representation, and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 37, 6(2015), 1113–1133.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Bo Sun, Siming Cao, Liandong Li, Jun He, and Lejun Yu. 2016. Exploring multimodal visual features for continuous affect recognition. In Proceedings of ACM International Audio/Visual Emotion Challenge and Workshop. 83–88.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Jonathan Tompson, Ross Goroshin, Arjun Jain, Yann LeCun, and Christoph Bregler. 2015. Efficient object localization using convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 648–656.Google ScholarGoogle ScholarCross RefCross Ref
  31. Hugo Toscano, Thomas W Schubert, and Steffen R Giessner. 2018. Eye Gaze and Head Posture Jointly Influence Judgments of Dominance, Physical Strength, and Anger. Journal of Nonverbal Behavior 42, 3 (2018), 285–309.Google ScholarGoogle ScholarCross RefCross Ref
  32. George Trigeorgis, Fabien Ringeval, Raymond Brueckner, Erik Marchi, Mihalis A Nicolaou, Björn Schuller, and Stefanos Zafeiriou. 2016. Adieu features? end-to-end speech emotion recognition using a deep convolutional recurrent network. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing. 5200–5204.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Michel F Valstar, Enrique Sánchez-Lozano, Jeffrey F Cohn, László A Jeni, Jeffrey M Girard, Zheng Zhang, Lijun Yin, and Maja Pantic. 2017. Fera 2017-addressing head pose in the third facial expression recognition and analysis challenge. In Proceedings of IEEE International Conference on Automatic Face & Gesture Recognition. 839–847.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Laurens Van Der Maaten. 2012. Audio-visual emotion challenge 2012: a simple approach. In Proceedings of ACM International Conference on Multimodal Interaction. 473–476.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Martin Wöllmer, Florian Eyben, Stephan Reiter, Björn Schuller, Cate Cox, Ellen Douglas-Cowie, and Roddy Cowie. 2008. Abandoning emotion classes-towards continuous emotion recognition with modelling of long-range dependencies. In Proceedings of Interspeech. 597–600.Google ScholarGoogle ScholarCross RefCross Ref
  36. Matthew D Zeiler. 2012. ADADELTA: an adaptive learning rate method. arXiv preprint arXiv:1212.5701(2012).Google ScholarGoogle Scholar
  37. Jinming Zhao, Ruichen Li, Shizhe Chen, and Qin Jin. 2018. Multi-modal Multi-cultural Dimensional Continues Emotion Recognition in Dyadic Interactions. In Proceedings of ACM International Audio/Visual Emotion Challenge and Workshop. 65–72.Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    ICMI '19: 2019 International Conference on Multimodal Interaction
    October 2019
    601 pages
    ISBN:9781450368605
    DOI:10.1145/3340555

    Copyright © 2019 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 14 October 2019

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    Overall Acceptance Rate453of1,080submissions,42%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format