Skip to main content

Multimodal Emotion Classification Supported in the Aggregation of Pre-trained Classification Models

  • Conference paper
  • First Online:
Computational Science – ICCS 2023 (ICCS 2023)

Abstract

Human-centric artificial intelligence struggles to build automated procedures that recognize emotions which can be integrated in artificial systems, such as user interfaces or social robots. In this context, this paper researches on building an Emotion Multi-modal Aggregator (EMmA) that will rely on a collection of open-source single source emotion classification methods aggregated to produce an emotion prediction. Although extendable, tested solution takes a video clip and divides into its frames and audio. Then a collection of primary classifiers are applied to each source and their results are combined in a final classifier utilizing machine learning aggregator techniques. The aggregator techniques that have been put to the test were Random Forest and k-Nearest Neighbors which, with an accuracy of 80%, have demonstrated superior performance over primary classifiers on the selected dataset.

We thank the Portuguese Foundation for Science and Technology (FCT) under Project UIDB/50009/2020—LARSyS.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    See, e.g., [51] for the definitions of Accuracy, Precision, Recall, and F1-score.

References

  1. Abdi, J., Al-Hindawi, A., Ng, T., Vizcaychipi, M.P.: Scoping review on the use of socially assistive robot technology in elderly care. BMJ Open 8(2), e018815 (2018). https://doi.org/10.1136/bmjopen-2017-018815

    Article  Google Scholar 

  2. Abdollahi, H., Mahoor, M., Zandie, R., Sewierski, J., Qualls, S.: Artificial emotional intelligence in socially assistive robots for older adults: A pilot study. IEEE Trans. Affect. Comput. (2022). https://doi.org/10.1109/taffc.2022.3143803

  3. Ahmed, F., Bari, A.S.M.H., Gavrilova, M.L.: Emotion recognition from body movement. IEEE Access 8, 11761–11781 (2020). https://doi.org/10.1109/ACCESS.2019.2963113

    Article  Google Scholar 

  4. Ali, G., et al.: Artificial neural network based ensemble approach for multicultural facial expressions analysis. IEEE Access 8, 134950–134963 (2020). https://doi.org/10.1109/ACCESS.2020.3009908

    Article  Google Scholar 

  5. Alonso-Martín, F., Malfaz, M., Sequeira, J., Gorostiza, J.F., Salichs, M.A.: A multimodal emotion detection system during human-robot interaction. Sensors 13(11), 15549–15581 (2013). https://doi.org/10.3390/s131115549, https://www.mdpi.com/1424-8220/13/11/15549

  6. Ardabili, S., Mosavi, A., Várkonyi-Kóczy, A.R.: Advances in machine learning modeling reviewing hybrid and ensemble methods, pp. 215–227 (2020). https://doi.org/10.1007/978-3-030-36841-8_21

  7. Banerjee, R., De, S., Dey, S.: A survey on various deep learning algorithms for an efficient facial expression recognition system. Int. J. Image Graph. (2021). https://doi.org/10.1142/S0219467822400058

  8. Benamara, N.K., et al.: Real-time facial expression recognition using smoothed deep neural network ensemble. Integrated Comput. Aided Eng. 28(1), 97–111 (2020). https://doi.org/10.3233/ICA-200643

    Article  Google Scholar 

  9. Bhatia, A., Rathee, A.: Multimodal emotion recognition (2020). https://github.com/ankurbhatia24/multimodal-emotion-recognition (Accessed 31 Jan 2023)

  10. Birjali, M., Kasri, M., Beni-Hssane, A.: A comprehensive survey on sentiment analysis: Approaches, challenges and trends. Knowl.-Based Syst. 226, 107134 (2021). https://doi.org/10.1016/j.knosys.2021.107134, https://www.sciencedirect.com/science/article/pii/S095070512100397X

  11. Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W.F., Weiss, B., et al.: A database of german emotional speech. In: Interspeech, vol. 5, pp. 1517–1520 (2005)

    Google Scholar 

  12. Burnwal, S.: Speech emotion recognition (2020). https://www.kaggle.com/code/shivamburnwal/speech-emotion-recognition/notebook (Accessed 31 Jan 2023)

  13. Busso, C., et al.: Iemocap: Interactive emotional dyadic motion capture database. Lang. Resour. Eval. 42, 335–359 (2008)

    Article  Google Scholar 

  14. Canedo, D., Neves, A.: Mood estimation based on facial expressions and postures. In: Proceedings of the RECPAD, pp. 49–50 (2020)

    Google Scholar 

  15. Cao, H., Cooper, D.G., Keutmann, M.K., Gur, R.C., Nenkova, A., Verma, R.: CREMA-d: Crowd-sourced emotional multimodal actors dataset. IEEE Trans. Affective Comput. 5(4), 377–390 (2014). https://doi.org/10.1109/TAFFC.2014.2336244

    Article  Google Scholar 

  16. Chen, M., He, X., Yang, J., Zhang, H.: 3-d convolutional recurrent neural networks with attention model for speech emotion recognition. IEEE Signal Processing Lett. 25(10), 1440–1444 (2018). https://doi.org/10.1109/LSP.2018.2860246

    Article  Google Scholar 

  17. Cheng, B., Wang, Y., Shao, D., Arora, C., Hoang, T., Liu, X.: Edge4emotion: An edge computing based multi-source emotion recognition platform for human-centric software engineering. In: 2021 IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet Computing (CCGrid), pp. 610–613 (2021). https://doi.org/10.1109/CCGrid51090.2021.00071

  18. Ekman, P.: Facial expressions of emotion: New findings, new questions. Psychol. Sci. 3(1), 34–38 (1992). https://doi.org/10.1111/j.1467-9280.1992.tb00253.x

    Article  MathSciNet  Google Scholar 

  19. Ekman, P., Friesen, W.V.: Constants across cultures in the face and emotion. J. Pers. Soc. Psychol. 17(2), 124–129 (1971). https://doi.org/10.1037/h0030377

    Article  Google Scholar 

  20. Filntisis, P.P., Efthymiou, N., Koutras, P., Potamianos, G., Maragos, P.: Fusing body posture with facial expressions for joint recognition of affect in child-robot interaction. IEEE Robotics Automa. Lett. 4(4), 4011–4018 (2019). https://doi.org/10.1109/LRA.2019.2930434

    Article  Google Scholar 

  21. Getson, C., Nejat, G.: Socially assistive robots helping older adults through the pandemic and life after COVID-19. Robotics 10(3), 106 (2021). https://doi.org/10.3390/robotics10030106

    Article  Google Scholar 

  22. Goodfellow, I.J., et al.: Challenges in representation learning: a report on three machine learning contests. In: Lee, M., Hirose, A., Hou, Z.-G., Kil, R.M. (eds.) ICONIP 2013. LNCS, vol. 8228, pp. 117–124. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-42051-1_16

    Chapter  Google Scholar 

  23. Haq, S., Jackson, P.: Machine Audition: Principles, Algorithms and Systems, chap. Multimodal Emotion Recognition, pp. 398–423. IGI Global, Hershey PA (Aug 2010)

    Google Scholar 

  24. Heredia, J., et al.: Adaptive multimodal emotion detection architecture for social robots. IEEE Access 10, 20727–20744 (2022). https://doi.org/10.1109/ACCESS.2022.3149214

    Article  Google Scholar 

  25. Jolly, E., Cheong, J.H., Xie, T., Byrne, S., Kenny, M., Chang, L.J.: Py-feat: Python facial expression analysis toolbox. arXiv preprint arXiv:2104.03509 (2021)

  26. Jolly, E., Cheong, J.H., Xie, T., Byrne, S., Kenny, M., Chang, L.J.: Py-feat: Python facial expression analysis toolbox (2021). https://doi.org/10.48550/arXiv.2104.03509

  27. Jolly, E., Cheong, J.H., Xie, T., Byrne, S., Kenny, M., Chang, L.J.: Py-feat: Python facial expression analysis toolbox (2023). https://pythonrepo.com/repo/cosanlab-py-feat-python-deep-learning (Accessed 31 Jan 2023)

  28. Kleinsmith, A., Bianchi-Berthouze, N.: Affective body expression perception and recognition: A survey. IEEE Trans. Affective Comput. 4(1), 15–33 (2013). https://doi.org/10.1109/T-AFFC.2012.16

    Article  Google Scholar 

  29. Kumaran, U., Radha Rammohan, S., Nagarajan, S.M., Prathik, A.: Fusion of mel and gammatone frequency cepstral coefficients for speech emotion recognition using deep C-RNN. Int. J. Speech Technol. 24(2), 303–314 (2021). https://doi.org/10.1007/s10772-020-09792-x

    Article  Google Scholar 

  30. Li, S., Deng, W.: Reliable crowdsourcing and deep locality-preserving learning for unconstrained facial expression recognition. IEEE Trans. Image Process. 28(1), 356–370 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  31. Liang, G., Wang, S., Wang, C.: Pose-aware adversarial domain adaptation for personalized facial expression recognition. arXiv preprint arXiv:2007.05932 (2020)

  32. Livingstone, S.R., Russo, F.A.: The ryerson audio-visual database of emotional speech and song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in north american english. PLoS ONE 13(5), e0196391 (2018). https://doi.org/10.1371/journal.pone.0196391

    Article  Google Scholar 

  33. Noroozi, F., Corneanu, C.A., Kamińska, D., Sapiński, T., Escalera, S., Anbarjafari, G.: Survey on emotional body gesture recognition. IEEE Trans. Affect. Comput. 12(2), 505–523 (2018)

    Article  Google Scholar 

  34. Novais, R., Cardoso, P.J.S., Rodrigues, J.M.F.: Emotion classification from speech by an ensemble strategy. In: ACM 10th International Conference on Software Development and Technologies for Enhancing Accessibility and Fighting Info-exclusion (DSAI 2022) (2022)

    Google Scholar 

  35. Novais, R., Cardoso, P.J.S., Rodrigues, J.M.F.: Facial emotions classification supported in an ensemble strategy, pp. 477–488 (2022). https://doi.org/10.1007/978-3-031-05028-2_32

  36. Ortega, J.D.S., Cardinal, P., Koerich, A.L.: Emotion recognition using fusion of audio and video features. In: 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC), pp. 3847–3852 (2019). https://doi.org/10.1109/SMC.2019.8914655

  37. Palanisamy, K., Singhania, D., Yao, A.: Rethinking cnn models for audio classification (2020). https://doi.org/10.48550/arXiv.2007.11154

  38. Pecoraro, R., Basile, V., Bono, V.: Local multi-head channel self-attention for facial expression recognition. Information 13(9), 419 (2022)

    Article  Google Scholar 

  39. Pecoraro, R., Basile, V., Bono, V., Gallo, S.: Lhc-net: Local multi-head channel self-attention (code). https://github.com/bodhis4ttva/lhc_net (Accessed 29 Jan 2023)

  40. Pedregosa, F., et al.: Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  41. Pichora-Fuller, M.K., Dupuis, K.: Toronto emotional speech set (tess) (2020). https://doi.org/10.5683/SP2/E8H2MF

  42. de Pinto, M.G.: Audio emotion classification from multiple datasets (2020). https://github.com/marcogdepinto/emotion-classification-from-audio-files (Accessed 31 Jan 2023)

  43. de Pinto, M.G., Polignano, M., Lops, P., Semeraro, G.: Emotions understanding model from spoken language using deep neural networks and mel-frequency cepstral coefficients (May 2020). https://doi.org/10.1109/EAIS48028.2020.9122698

  44. Popova, A.S., Rassadin, A.G., Ponomarenko, A.A.: Emotion recognition in sound, pp. 117–124 (Aug 2017). https://doi.org/10.1007/978-3-319-66604-4_18

  45. Poria, S., Hazarika, D., Majumder, N., Mihalcea, R.: Beneath the tip of the iceberg: Current challenges and new directions in sentiment analysis research. IEEE Trans. Affective Comput. (2020)

    Google Scholar 

  46. Poria, S., Hazarika, D., Majumder, N., Naik, G., Cambria, E., Mihalcea, R.: MELD: A multimodal multi-party dataset for emotion recognition in conversations. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 527–536 (2019)

    Google Scholar 

  47. Revina, I., Emmanuel, W.S.: A survey on human face expression recognition techniques. J. King Saud Univ. - Comput. Inform. Sci. 33(6), 619–628 (2021). https://doi.org/10.1016/j.jksuci.2018.09.002

    Article  Google Scholar 

  48. Seknedy, M.E., Fawzi, S.: Speech emotion recognition system for human interaction applications (Dec 2021). https://doi.org/10.1109/ICICIS52592.2021.9694246

  49. Shenk, J., CG, A., Arriaga, O., Owlwasrowk: justinshenk/fer: Zenodo (Sep 2021). https://doi.org/10.5281/zenodo.5362356

  50. Siddiqui, M.F.H., Javaid, A.Y.: A multimodal facial emotion recognition framework through the fusion of speech with visible and infrared images. Multimodal Technol. Interact. 4(3), 46 (2020). https://doi.org/10.3390/mti4030046

    Article  Google Scholar 

  51. Sokolova, M., Lapalme, G.: A systematic analysis of performance measures for classification tasks. Inform. Process. Manage. 45(4), 427–437 (2009)

    Article  Google Scholar 

  52. Sorrentino, A., Mancioppi, G., Coviello, L., Cavallo, F., Fiorini, L.: Feasibility study on the role of personality, emotion, and engagement in socially assistive robotics: A cognitive assessment scenario. Informatics 8(2), 23 (2021). https://doi.org/10.3390/informatics8020023

    Article  Google Scholar 

  53. Stock-Homburg, R.: Survey of emotions in human–robot interactions: perspectives from robotic psychology on 20 years of research. Int. J. Soc. Robot. 14(2), 389–411 (2021). https://doi.org/10.1007/s12369-021-00778-6

    Article  Google Scholar 

  54. Wang, Z., Zeng, F., Liu, S., Zeng, B.: OAENet: Oriented attention ensemble for accurate facial expression recognition. Pattern Recogn. 112, 107694 (2021). https://doi.org/10.1016/j.patcog.2020.107694

    Article  Google Scholar 

  55. Zavaschi, T.H.H., Koerich, A.L., Oliveira, L.E.S.: Facial expression recognition using ensemble of classifiers (May 2011). https://doi.org/10.1109/ICASSP.2011.5946775

  56. Zhang, F., Zhang, T., Mao, Q., Xu, C.: Joint pose and expression modeling for facial expression recognition (Jun 2018). https://doi.org/10.1109/CVPR.2018.00354

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pedro J. S. Cardoso .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cardoso, P.J.S., Rodrigues, J.M.F., Novais, R. (2023). Multimodal Emotion Classification Supported in the Aggregation of Pre-trained Classification Models. In: Mikyška, J., de Mulatier, C., Paszynski, M., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M. (eds) Computational Science – ICCS 2023. ICCS 2023. Lecture Notes in Computer Science, vol 10477. Springer, Cham. https://doi.org/10.1007/978-3-031-36030-5_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-36030-5_35

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-36029-9

  • Online ISBN: 978-3-031-36030-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics