Skip to main content

Multi-Stream Temporal Networks for Emotion Recognition in Children and in the Wild

  • Chapter
  • First Online:
Modeling Visual Aesthetics, Emotion, and Artistic Style

Abstract

In this chapter, we extend and leverage the temporal segment networks framework for emotion recognition in children and in the wild. To that end, we explore the effect of different information streams (Body, Face, Context, Audio, Word Embeddings) and representations (RGB, Flow). We perform an extensive ablation analysis, including the effect of each representation and modality on different emotions, and verify the performance of the proposed systems against the previous SoTA methods in the EmoReact and the BoLD datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    PyTorch code available at https://github.com/filby89/NTUA-BEEU-eccv2020.

  2. 2.

    We have made the code for the experiments publicly available at https://github.com/filby89/multimodal-emotion-recognition.

References

  1. Antoniadis, P., Pikoulis, I., Filntisis, P.P., Maragos, P.: An audiovisual and contextual approach for categorical and continuous emotion recognition in-the-wild. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, pp. 3645–3651 (2021)

    Google Scholar 

  2. Aviezer, H., Trope, Y., Todorov, A.: Body cues, not facial expressions, discriminate between intense positive and negative emotions. Science 338(6111), 1225–1229 (2012)

    Article  Google Scholar 

  3. Avots, E., Sapiński, T., Bachmann, M., Kamińska, D.: Audiovisual emotion recognition in wild. Mach. Vis. Appl. 30(5), 975–985 (2019)

    Article  Google Scholar 

  4. Baltrusaitis, T., Zadeh, A., Lim, Y.C., Morency, L.: Openface 2.0: facial behavior analysis toolkit. In: Proc. FG, pp. 59–66 (2018). https://doi.org/10.1109/FG.2018.00019

  5. Bänziger, T., Pirker, H., Scherer, K.: GEMEP-Geneva multimodal emotion portrayals: a corpus for the study of multimodal emotional expressions. In: Proc. LREC, vol. 6, pp. 15–19 (2006)

    Google Scholar 

  6. Cao, Z., Hidalgo Martinez, G., Simon, T., Wei, S., Sheikh, Y.A.: Openpose: realtime multi-person 2d pose estimation using part affinity fields. IEEE Trans. Pattern Anal. Mach. Intell. 43, 172–186 (2019)

    Article  Google Scholar 

  7. Castellano, G., Leite, I., Pereira, A., Martinho, C., Paiva, A., Mcowan, P.W.: Multimodal affect modeling and recognition for empathic robot companions. Int. J. Humanoid Rob. 10, 1350010 (2013)

    Article  Google Scholar 

  8. Cavallo, F., Semeraro, F., Fiorini, L., Magyar, G., Sinčák, P., Dario, P.: Emotion modelling for social robotics applications: a review. J. Bionic Eng. 15(2), 185–203 (2018)

    Article  Google Scholar 

  9. Dael, N., Mortillaro, M., Scherer, K.R.: The body action and posture coding system (BAP): development and reliability. J. Nonverbal Behav. 36(2), 97–121 (2012)

    Article  Google Scholar 

  10. Dael, N., Mortillaro, M., Scherer, K.R.: Emotion expression in body action and posture. Emotion 12(5), 1085 (2012)

    Article  Google Scholar 

  11. De Gelder, B.: Why bodies? Twelve reasons for including bodily expressions in affective neuroscience. Philos. Trans. R. Soc. Lond. B: Biol. Sci. 364(1535), 3475–3484 (2009)

    Google Scholar 

  12. De Silva, L.C.: Audiovisual emotion recognition. In: Proc. Int. Conf. on Systems, Man and Cybernetics (2004)

    Google Scholar 

  13. Dong, J., Li, X., Snoek, C.G.: Word2visualvec: image and video to sentence matching by visual feature prediction. arXiv preprint arXiv:1604.06838 (2016)

    Google Scholar 

  14. Du, S., Tao, Y., Martinez, A.M.: Compound facial expressions of emotion. Proc. Natl. Acad. Sci. 111(15), E1454–E1462 (2014)

    Article  Google Scholar 

  15. Ekman, P., Keltner, D.: Universal facial expressions of emotion. In: Segerstrale, U., Molnar, P. (eds.) Nonverbal Communication: Where Nature Meets Culture, pp. 27–46. Routledge, Milton Park (1997)

    Google Scholar 

  16. Ekman, R.: What the Face Reveals: Basic and Applied Studies of Spontaneous Expression Using the Facial Action Coding System (FACS). Oxford University Press, Oxford (1997)

    Google Scholar 

  17. El Ayadi, M., Kamel, M.S., Karray, F.: Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recognit. 44(3), 572–587 (2011)

    Article  Google Scholar 

  18. Filntisis, P.P., Efthymiou, N., Koutras, P., Potamianos, G., Maragos, P.: Fusing body posture with facial expressions for joint recognition of affect in child–robot interaction. IEEE Rob. Autom. Lett. 4(4), 4011–4018 (2019)

    Article  Google Scholar 

  19. Friesen, W.V., Ekman, P., et al.: Emfacs-7: emotional facial action coding system. Unpublished manuscript, University of California at San Francisco 2(36), 1 (1983)

    Google Scholar 

  20. Frome, A., Corrado, G.S., Shlens, J., Bengio, S., Dean, J., Ranzato, M., Mikolov, T.: Devise: a deep visual-semantic embedding model. In: Advances in Neural Information Processing Systems, pp. 2121–2129 (2013)

    Google Scholar 

  21. Gaudelus, B., Virgile, J., Geliot, S., Franck, N., Dupuis, M., Hochard, C., Josserand, A., Koubichkine, A., Lambert, T., Perez, M., et al.: Improving facial emotion recognition in schizophrenia: a controlled study comparing specific and attentional focused cognitive remediation. Front. Psychiatry 7, 105 (2016)

    Article  Google Scholar 

  22. Goulart, C., Valadão, C., Delisle-Rodriguez, D., Funayama, D., Favarato, A., Baldo, G., Binotte, V., Caldeira, E., Bastos-Filho, T.: Visual and thermal image processing for facial specific landmark detection to infer emotions in a child-robot interaction. Sensors 19, 2844 (2019)

    Article  Google Scholar 

  23. Gunes, H., Piccardi, M.: A bimodal face and body gesture database for automatic analysis of human nonverbal affective behavior. In: Proc. ICPR, vol. 1, pp. 1148–1153 (2006)

    Google Scholar 

  24. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  25. Kleinsmith, A., Bianchi-Berthouze, N.: Affective body expression perception and recognition: a survey. IEEE Trans. Affect. Comput. 4(1), 15–33 (2013)

    Article  Google Scholar 

  26. Kosti, R., Alvarez, J.M., Recasens, A., Lapedriza, A.: Emotion recognition in context. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1960–1968 (2017)

    Google Scholar 

  27. Lee, J., Kim, S., Kim, S., Park, J., Sohn, K.: Context-aware emotion recognition networks. In: Proc. IEEE International Conference on Computer Vision, pp. 10143–10152 (2019)

    Google Scholar 

  28. Lopez-Rincon, A.: Emotion recognition using facial expressions in children using the NAO robot. In: Proc. CONIELECOMP, pp. 146–153 (2019)

    Google Scholar 

  29. Lucey, P., Cohn, J.F., Kanade, T., Saragih, J., Ambadar, Z., Matthews, I.: The extended Cohn-Kanade dataset (CK+): a complete dataset for action unit and emotion-specified expression. In: Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops, pp. 94–101 (2010)

    Google Scholar 

  30. Luo, Y., Ye, J., Adams Jr., R.B., Li, J., Newman, M.G., Wang, J.Z.: ARBEE: towards automated recognition of bodily expression of emotion in the wild. Int. J. Comput. Vis. 128(1), 1–25 (2020)

    Article  Google Scholar 

  31. Marinoiu, E., Zanfir, M., Olaru, V., Sminchisescu, C.: 3D human sensing, action and emotion recognition in robot assisted therapy of children with autism. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2158–2167 (2018)

    Google Scholar 

  32. Mittal, T., Guhan, P., Bhattacharya, U., Chandra, R., Bera, A., Manocha, D.: Emoticon: context-aware multimodal emotion recognition using Frege’s principle. In: Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14234–14243 (2020)

    Google Scholar 

  33. Mollahosseini, A., Hasani, B., Mahoor, M.H.: AffectNet: a database for facial expression, valence, and arousal computing in the wild. IEEE Trans. Affect. Comput. 10(1), 18–31 (2017)

    Article  Google Scholar 

  34. Nagarajan, B., Oruganti, V.R.: Cross-domain transfer learning for complex emotion recognition. In: Proc. TENSYMP (2019)

    Google Scholar 

  35. Nojavanasghari, B., Baltrušaitis, T., Hughes, C.E., Morency, L.P.: EmoReact: a multimodal approach and dataset for recognizing emotional responses in children. In: Proc. ICMI (2016)

    Google Scholar 

  36. Pantic, M., Sebe, N., Cohn, J.F., Huang, T.: Affective multimodal human-computer interaction. In: Proc. Int. Conf. on Multimedia (2005)

    Google Scholar 

  37. Pennington, J., Socher, R., Manning, C.D.: GloVE: Global vectors for word representation. In: Proc. Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)

    Google Scholar 

  38. Ren, Z., Jin, H., Lin, Z., Fang, C., Yuille, A.L.: Multiple instance visual-semantic embedding. In: Proc. BMVC (2017)

    Google Scholar 

  39. Russell, J.A., Mehrabian, A.: Evidence for a three-factor theory of emotions. J. Res. Pers. 11(3), 273–294 (1977)

    Article  Google Scholar 

  40. Tracy, J.L., Robins, R.W.: Show your pride: evidence for a discrete emotion expression. Psychol. Sci. 15(3), 194–197 (2004)

    Article  Google Scholar 

  41. Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., Van Gool, L.: Temporal segment networks: towards good practices for deep action recognition. In: European Conference on Computer Vision, pp. 20–36. Springer, Berlin (2016)

    Google Scholar 

  42. Wei, Z., Zhang, J., Lin, Z., Lee, J.Y., Balasubramanian, N., Hoai, M., Samaras, D.: Learning visual emotion representations from web data. In: Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13106–13115 (2020)

    Google Scholar 

  43. Wortman, B., Wang, J.Z.: Hicem: a high-coverage emotion model for artificial emotional intelligence. arXiv preprint arXiv:2206.07593 (2022)

    Google Scholar 

  44. Yeh, M.C., Li, Y.N.: Multilabel deep visual-semantic embedding. IEEE Trans. Pattern Anal. Mach. Intell. 42(6), 1530–1536 (2020)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Panagiotis P. Filntisis .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Filntisis, P.P., Efthymiou, N., Potamianos, G., Maragos, P. (2024). Multi-Stream Temporal Networks for Emotion Recognition in Children and in the Wild. In: Wang, J.Z., Adams, Jr., R.B. (eds) Modeling Visual Aesthetics, Emotion, and Artistic Style. Springer, Cham. https://doi.org/10.1007/978-3-031-50269-9_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-50269-9_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-50268-2

  • Online ISBN: 978-3-031-50269-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics