Multi-Stream Temporal Networks for Emotion Recognition in Children and in the Wild

Filntisis, Panagiotis P.; Efthymiou, Niki; Potamianos, Gerasimos; Maragos, Petros

doi:10.1007/978-3-031-50269-9_10

Panagiotis P. Filntisis³,
Niki Efthymiou³,
Gerasimos Potamianos⁴ &
…
Petros Maragos³

47 Accesses

Abstract

In this chapter, we extend and leverage the temporal segment networks framework for emotion recognition in children and in the wild. To that end, we explore the effect of different information streams (Body, Face, Context, Audio, Word Embeddings) and representations (RGB, Flow). We perform an extensive ablation analysis, including the effect of each representation and modality on different emotions, and verify the performance of the proposed systems against the previous SoTA methods in the EmoReact and the BoLD datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
PyTorch code available at https://github.com/filby89/NTUA-BEEU-eccv2020.
2.
We have made the code for the experiments publicly available at https://github.com/filby89/multimodal-emotion-recognition.

References

Antoniadis, P., Pikoulis, I., Filntisis, P.P., Maragos, P.: An audiovisual and contextual approach for categorical and continuous emotion recognition in-the-wild. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, pp. 3645–3651 (2021)
Google Scholar
Aviezer, H., Trope, Y., Todorov, A.: Body cues, not facial expressions, discriminate between intense positive and negative emotions. Science 338(6111), 1225–1229 (2012)
Article Google Scholar
Avots, E., Sapiński, T., Bachmann, M., Kamińska, D.: Audiovisual emotion recognition in wild. Mach. Vis. Appl. 30(5), 975–985 (2019)
Article Google Scholar
Baltrusaitis, T., Zadeh, A., Lim, Y.C., Morency, L.: Openface 2.0: facial behavior analysis toolkit. In: Proc. FG, pp. 59–66 (2018). https://doi.org/10.1109/FG.2018.00019
Bänziger, T., Pirker, H., Scherer, K.: GEMEP-Geneva multimodal emotion portrayals: a corpus for the study of multimodal emotional expressions. In: Proc. LREC, vol. 6, pp. 15–19 (2006)
Google Scholar
Cao, Z., Hidalgo Martinez, G., Simon, T., Wei, S., Sheikh, Y.A.: Openpose: realtime multi-person 2d pose estimation using part affinity fields. IEEE Trans. Pattern Anal. Mach. Intell. 43, 172–186 (2019)
Article Google Scholar
Castellano, G., Leite, I., Pereira, A., Martinho, C., Paiva, A., Mcowan, P.W.: Multimodal affect modeling and recognition for empathic robot companions. Int. J. Humanoid Rob. 10, 1350010 (2013)
Article Google Scholar
Cavallo, F., Semeraro, F., Fiorini, L., Magyar, G., Sinčák, P., Dario, P.: Emotion modelling for social robotics applications: a review. J. Bionic Eng. 15(2), 185–203 (2018)
Article Google Scholar
Dael, N., Mortillaro, M., Scherer, K.R.: The body action and posture coding system (BAP): development and reliability. J. Nonverbal Behav. 36(2), 97–121 (2012)
Article Google Scholar
Dael, N., Mortillaro, M., Scherer, K.R.: Emotion expression in body action and posture. Emotion 12(5), 1085 (2012)
Article Google Scholar
De Gelder, B.: Why bodies? Twelve reasons for including bodily expressions in affective neuroscience. Philos. Trans. R. Soc. Lond. B: Biol. Sci. 364(1535), 3475–3484 (2009)
Google Scholar
De Silva, L.C.: Audiovisual emotion recognition. In: Proc. Int. Conf. on Systems, Man and Cybernetics (2004)
Google Scholar
Dong, J., Li, X., Snoek, C.G.: Word2visualvec: image and video to sentence matching by visual feature prediction. arXiv preprint arXiv:1604.06838 (2016)
Google Scholar
Du, S., Tao, Y., Martinez, A.M.: Compound facial expressions of emotion. Proc. Natl. Acad. Sci. 111(15), E1454–E1462 (2014)
Article Google Scholar
Ekman, P., Keltner, D.: Universal facial expressions of emotion. In: Segerstrale, U., Molnar, P. (eds.) Nonverbal Communication: Where Nature Meets Culture, pp. 27–46. Routledge, Milton Park (1997)
Google Scholar
Ekman, R.: What the Face Reveals: Basic and Applied Studies of Spontaneous Expression Using the Facial Action Coding System (FACS). Oxford University Press, Oxford (1997)
Google Scholar
El Ayadi, M., Kamel, M.S., Karray, F.: Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recognit. 44(3), 572–587 (2011)
Article Google Scholar
Filntisis, P.P., Efthymiou, N., Koutras, P., Potamianos, G., Maragos, P.: Fusing body posture with facial expressions for joint recognition of affect in child–robot interaction. IEEE Rob. Autom. Lett. 4(4), 4011–4018 (2019)
Article Google Scholar
Friesen, W.V., Ekman, P., et al.: Emfacs-7: emotional facial action coding system. Unpublished manuscript, University of California at San Francisco 2(36), 1 (1983)
Google Scholar
Frome, A., Corrado, G.S., Shlens, J., Bengio, S., Dean, J., Ranzato, M., Mikolov, T.: Devise: a deep visual-semantic embedding model. In: Advances in Neural Information Processing Systems, pp. 2121–2129 (2013)
Google Scholar
Gaudelus, B., Virgile, J., Geliot, S., Franck, N., Dupuis, M., Hochard, C., Josserand, A., Koubichkine, A., Lambert, T., Perez, M., et al.: Improving facial emotion recognition in schizophrenia: a controlled study comparing specific and attentional focused cognitive remediation. Front. Psychiatry 7, 105 (2016)
Article Google Scholar
Goulart, C., Valadão, C., Delisle-Rodriguez, D., Funayama, D., Favarato, A., Baldo, G., Binotte, V., Caldeira, E., Bastos-Filho, T.: Visual and thermal image processing for facial specific landmark detection to infer emotions in a child-robot interaction. Sensors 19, 2844 (2019)
Article Google Scholar
Gunes, H., Piccardi, M.: A bimodal face and body gesture database for automatic analysis of human nonverbal affective behavior. In: Proc. ICPR, vol. 1, pp. 1148–1153 (2006)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Kleinsmith, A., Bianchi-Berthouze, N.: Affective body expression perception and recognition: a survey. IEEE Trans. Affect. Comput. 4(1), 15–33 (2013)
Article Google Scholar
Kosti, R., Alvarez, J.M., Recasens, A., Lapedriza, A.: Emotion recognition in context. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1960–1968 (2017)
Google Scholar
Lee, J., Kim, S., Kim, S., Park, J., Sohn, K.: Context-aware emotion recognition networks. In: Proc. IEEE International Conference on Computer Vision, pp. 10143–10152 (2019)
Google Scholar
Lopez-Rincon, A.: Emotion recognition using facial expressions in children using the NAO robot. In: Proc. CONIELECOMP, pp. 146–153 (2019)
Google Scholar
Lucey, P., Cohn, J.F., Kanade, T., Saragih, J., Ambadar, Z., Matthews, I.: The extended Cohn-Kanade dataset (CK+): a complete dataset for action unit and emotion-specified expression. In: Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops, pp. 94–101 (2010)
Google Scholar
Luo, Y., Ye, J., Adams Jr., R.B., Li, J., Newman, M.G., Wang, J.Z.: ARBEE: towards automated recognition of bodily expression of emotion in the wild. Int. J. Comput. Vis. 128(1), 1–25 (2020)
Article Google Scholar
Marinoiu, E., Zanfir, M., Olaru, V., Sminchisescu, C.: 3D human sensing, action and emotion recognition in robot assisted therapy of children with autism. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2158–2167 (2018)
Google Scholar
Mittal, T., Guhan, P., Bhattacharya, U., Chandra, R., Bera, A., Manocha, D.: Emoticon: context-aware multimodal emotion recognition using Frege’s principle. In: Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14234–14243 (2020)
Google Scholar
Mollahosseini, A., Hasani, B., Mahoor, M.H.: AffectNet: a database for facial expression, valence, and arousal computing in the wild. IEEE Trans. Affect. Comput. 10(1), 18–31 (2017)
Article Google Scholar
Nagarajan, B., Oruganti, V.R.: Cross-domain transfer learning for complex emotion recognition. In: Proc. TENSYMP (2019)
Google Scholar
Nojavanasghari, B., Baltrušaitis, T., Hughes, C.E., Morency, L.P.: EmoReact: a multimodal approach and dataset for recognizing emotional responses in children. In: Proc. ICMI (2016)
Google Scholar
Pantic, M., Sebe, N., Cohn, J.F., Huang, T.: Affective multimodal human-computer interaction. In: Proc. Int. Conf. on Multimedia (2005)
Google Scholar
Pennington, J., Socher, R., Manning, C.D.: GloVE: Global vectors for word representation. In: Proc. Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Google Scholar
Ren, Z., Jin, H., Lin, Z., Fang, C., Yuille, A.L.: Multiple instance visual-semantic embedding. In: Proc. BMVC (2017)
Google Scholar
Russell, J.A., Mehrabian, A.: Evidence for a three-factor theory of emotions. J. Res. Pers. 11(3), 273–294 (1977)
Article Google Scholar
Tracy, J.L., Robins, R.W.: Show your pride: evidence for a discrete emotion expression. Psychol. Sci. 15(3), 194–197 (2004)
Article Google Scholar
Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., Van Gool, L.: Temporal segment networks: towards good practices for deep action recognition. In: European Conference on Computer Vision, pp. 20–36. Springer, Berlin (2016)
Google Scholar
Wei, Z., Zhang, J., Lin, Z., Lee, J.Y., Balasubramanian, N., Hoai, M., Samaras, D.: Learning visual emotion representations from web data. In: Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13106–13115 (2020)
Google Scholar
Wortman, B., Wang, J.Z.: Hicem: a high-coverage emotion model for artificial emotional intelligence. arXiv preprint arXiv:2206.07593 (2022)
Google Scholar
Yeh, M.C., Li, Y.N.: Multilabel deep visual-semantic embedding. IEEE Trans. Pattern Anal. Mach. Intell. 42(6), 1530–1536 (2020)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Electrical and Computer Engineering, National Technical University of Athens, Athens, Greece
Panagiotis P. Filntisis, Niki Efthymiou & Petros Maragos
Department of Electrical and Computer Engineering, University of Thessaly, Volos, Greece
Gerasimos Potamianos

Authors

Panagiotis P. Filntisis
View author publications
You can also search for this author in PubMed Google Scholar
Niki Efthymiou
View author publications
You can also search for this author in PubMed Google Scholar
Gerasimos Potamianos
View author publications
You can also search for this author in PubMed Google Scholar
Petros Maragos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Panagiotis P. Filntisis .

Editor information

Editors and Affiliations

Information Sciences and Technology, The Pennsylvania State University, University Park, PA, USA
James Z. Wang
Department of Psychology, The Pennsylvania State University, University Park, PA, USA
Reginald B. Adams, Jr.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Filntisis, P.P., Efthymiou, N., Potamianos, G., Maragos, P. (2024). Multi-Stream Temporal Networks for Emotion Recognition in Children and in the Wild. In: Wang, J.Z., Adams, Jr., R.B. (eds) Modeling Visual Aesthetics, Emotion, and Artistic Style. Springer, Cham. https://doi.org/10.1007/978-3-031-50269-9_10

Download citation

DOI: https://doi.org/10.1007/978-3-031-50269-9_10
Published: 25 November 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-50268-2
Online ISBN: 978-3-031-50269-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics