Abstract
In this chapter, we extend and leverage the temporal segment networks framework for emotion recognition in children and in the wild. To that end, we explore the effect of different information streams (Body, Face, Context, Audio, Word Embeddings) and representations (RGB, Flow). We perform an extensive ablation analysis, including the effect of each representation and modality on different emotions, and verify the performance of the proposed systems against the previous SoTA methods in the EmoReact and the BoLD datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
PyTorch code available at https://github.com/filby89/NTUA-BEEU-eccv2020.
- 2.
We have made the code for the experiments publicly available at https://github.com/filby89/multimodal-emotion-recognition.
References
Antoniadis, P., Pikoulis, I., Filntisis, P.P., Maragos, P.: An audiovisual and contextual approach for categorical and continuous emotion recognition in-the-wild. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, pp. 3645–3651 (2021)
Aviezer, H., Trope, Y., Todorov, A.: Body cues, not facial expressions, discriminate between intense positive and negative emotions. Science 338(6111), 1225–1229 (2012)
Avots, E., Sapiński, T., Bachmann, M., Kamińska, D.: Audiovisual emotion recognition in wild. Mach. Vis. Appl. 30(5), 975–985 (2019)
Baltrusaitis, T., Zadeh, A., Lim, Y.C., Morency, L.: Openface 2.0: facial behavior analysis toolkit. In: Proc. FG, pp. 59–66 (2018). https://doi.org/10.1109/FG.2018.00019
Bänziger, T., Pirker, H., Scherer, K.: GEMEP-Geneva multimodal emotion portrayals: a corpus for the study of multimodal emotional expressions. In: Proc. LREC, vol. 6, pp. 15–19 (2006)
Cao, Z., Hidalgo Martinez, G., Simon, T., Wei, S., Sheikh, Y.A.: Openpose: realtime multi-person 2d pose estimation using part affinity fields. IEEE Trans. Pattern Anal. Mach. Intell. 43, 172–186 (2019)
Castellano, G., Leite, I., Pereira, A., Martinho, C., Paiva, A., Mcowan, P.W.: Multimodal affect modeling and recognition for empathic robot companions. Int. J. Humanoid Rob. 10, 1350010 (2013)
Cavallo, F., Semeraro, F., Fiorini, L., Magyar, G., Sinčák, P., Dario, P.: Emotion modelling for social robotics applications: a review. J. Bionic Eng. 15(2), 185–203 (2018)
Dael, N., Mortillaro, M., Scherer, K.R.: The body action and posture coding system (BAP): development and reliability. J. Nonverbal Behav. 36(2), 97–121 (2012)
Dael, N., Mortillaro, M., Scherer, K.R.: Emotion expression in body action and posture. Emotion 12(5), 1085 (2012)
De Gelder, B.: Why bodies? Twelve reasons for including bodily expressions in affective neuroscience. Philos. Trans. R. Soc. Lond. B: Biol. Sci. 364(1535), 3475–3484 (2009)
De Silva, L.C.: Audiovisual emotion recognition. In: Proc. Int. Conf. on Systems, Man and Cybernetics (2004)
Dong, J., Li, X., Snoek, C.G.: Word2visualvec: image and video to sentence matching by visual feature prediction. arXiv preprint arXiv:1604.06838 (2016)
Du, S., Tao, Y., Martinez, A.M.: Compound facial expressions of emotion. Proc. Natl. Acad. Sci. 111(15), E1454–E1462 (2014)
Ekman, P., Keltner, D.: Universal facial expressions of emotion. In: Segerstrale, U., Molnar, P. (eds.) Nonverbal Communication: Where Nature Meets Culture, pp. 27–46. Routledge, Milton Park (1997)
Ekman, R.: What the Face Reveals: Basic and Applied Studies of Spontaneous Expression Using the Facial Action Coding System (FACS). Oxford University Press, Oxford (1997)
El Ayadi, M., Kamel, M.S., Karray, F.: Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recognit. 44(3), 572–587 (2011)
Filntisis, P.P., Efthymiou, N., Koutras, P., Potamianos, G., Maragos, P.: Fusing body posture with facial expressions for joint recognition of affect in child–robot interaction. IEEE Rob. Autom. Lett. 4(4), 4011–4018 (2019)
Friesen, W.V., Ekman, P., et al.: Emfacs-7: emotional facial action coding system. Unpublished manuscript, University of California at San Francisco 2(36), 1 (1983)
Frome, A., Corrado, G.S., Shlens, J., Bengio, S., Dean, J., Ranzato, M., Mikolov, T.: Devise: a deep visual-semantic embedding model. In: Advances in Neural Information Processing Systems, pp. 2121–2129 (2013)
Gaudelus, B., Virgile, J., Geliot, S., Franck, N., Dupuis, M., Hochard, C., Josserand, A., Koubichkine, A., Lambert, T., Perez, M., et al.: Improving facial emotion recognition in schizophrenia: a controlled study comparing specific and attentional focused cognitive remediation. Front. Psychiatry 7, 105 (2016)
Goulart, C., Valadão, C., Delisle-Rodriguez, D., Funayama, D., Favarato, A., Baldo, G., Binotte, V., Caldeira, E., Bastos-Filho, T.: Visual and thermal image processing for facial specific landmark detection to infer emotions in a child-robot interaction. Sensors 19, 2844 (2019)
Gunes, H., Piccardi, M.: A bimodal face and body gesture database for automatic analysis of human nonverbal affective behavior. In: Proc. ICPR, vol. 1, pp. 1148–1153 (2006)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Kleinsmith, A., Bianchi-Berthouze, N.: Affective body expression perception and recognition: a survey. IEEE Trans. Affect. Comput. 4(1), 15–33 (2013)
Kosti, R., Alvarez, J.M., Recasens, A., Lapedriza, A.: Emotion recognition in context. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1960–1968 (2017)
Lee, J., Kim, S., Kim, S., Park, J., Sohn, K.: Context-aware emotion recognition networks. In: Proc. IEEE International Conference on Computer Vision, pp. 10143–10152 (2019)
Lopez-Rincon, A.: Emotion recognition using facial expressions in children using the NAO robot. In: Proc. CONIELECOMP, pp. 146–153 (2019)
Lucey, P., Cohn, J.F., Kanade, T., Saragih, J., Ambadar, Z., Matthews, I.: The extended Cohn-Kanade dataset (CK+): a complete dataset for action unit and emotion-specified expression. In: Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops, pp. 94–101 (2010)
Luo, Y., Ye, J., Adams Jr., R.B., Li, J., Newman, M.G., Wang, J.Z.: ARBEE: towards automated recognition of bodily expression of emotion in the wild. Int. J. Comput. Vis. 128(1), 1–25 (2020)
Marinoiu, E., Zanfir, M., Olaru, V., Sminchisescu, C.: 3D human sensing, action and emotion recognition in robot assisted therapy of children with autism. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2158–2167 (2018)
Mittal, T., Guhan, P., Bhattacharya, U., Chandra, R., Bera, A., Manocha, D.: Emoticon: context-aware multimodal emotion recognition using Frege’s principle. In: Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14234–14243 (2020)
Mollahosseini, A., Hasani, B., Mahoor, M.H.: AffectNet: a database for facial expression, valence, and arousal computing in the wild. IEEE Trans. Affect. Comput. 10(1), 18–31 (2017)
Nagarajan, B., Oruganti, V.R.: Cross-domain transfer learning for complex emotion recognition. In: Proc. TENSYMP (2019)
Nojavanasghari, B., Baltrušaitis, T., Hughes, C.E., Morency, L.P.: EmoReact: a multimodal approach and dataset for recognizing emotional responses in children. In: Proc. ICMI (2016)
Pantic, M., Sebe, N., Cohn, J.F., Huang, T.: Affective multimodal human-computer interaction. In: Proc. Int. Conf. on Multimedia (2005)
Pennington, J., Socher, R., Manning, C.D.: GloVE: Global vectors for word representation. In: Proc. Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Ren, Z., Jin, H., Lin, Z., Fang, C., Yuille, A.L.: Multiple instance visual-semantic embedding. In: Proc. BMVC (2017)
Russell, J.A., Mehrabian, A.: Evidence for a three-factor theory of emotions. J. Res. Pers. 11(3), 273–294 (1977)
Tracy, J.L., Robins, R.W.: Show your pride: evidence for a discrete emotion expression. Psychol. Sci. 15(3), 194–197 (2004)
Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., Van Gool, L.: Temporal segment networks: towards good practices for deep action recognition. In: European Conference on Computer Vision, pp. 20–36. Springer, Berlin (2016)
Wei, Z., Zhang, J., Lin, Z., Lee, J.Y., Balasubramanian, N., Hoai, M., Samaras, D.: Learning visual emotion representations from web data. In: Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13106–13115 (2020)
Wortman, B., Wang, J.Z.: Hicem: a high-coverage emotion model for artificial emotional intelligence. arXiv preprint arXiv:2206.07593 (2022)
Yeh, M.C., Li, Y.N.: Multilabel deep visual-semantic embedding. IEEE Trans. Pattern Anal. Mach. Intell. 42(6), 1530–1536 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Filntisis, P.P., Efthymiou, N., Potamianos, G., Maragos, P. (2024). Multi-Stream Temporal Networks for Emotion Recognition in Children and in the Wild. In: Wang, J.Z., Adams, Jr., R.B. (eds) Modeling Visual Aesthetics, Emotion, and Artistic Style. Springer, Cham. https://doi.org/10.1007/978-3-031-50269-9_10
Download citation
DOI: https://doi.org/10.1007/978-3-031-50269-9_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-50268-2
Online ISBN: 978-3-031-50269-9
eBook Packages: Computer ScienceComputer Science (R0)