Multimodal Emotion Classification Supported in the Aggregation of Pre-trained Classification Models

Cardoso, Pedro J. S.; Rodrigues, João M. F.; Novais, Rui

doi:10.1007/978-3-031-36030-5_35

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 10477))

Included in the following conference series:

International Conference on Computational Science

658 Accesses
1 Citations

Abstract

Human-centric artificial intelligence struggles to build automated procedures that recognize emotions which can be integrated in artificial systems, such as user interfaces or social robots. In this context, this paper researches on building an Emotion Multi-modal Aggregator (EMmA) that will rely on a collection of open-source single source emotion classification methods aggregated to produce an emotion prediction. Although extendable, tested solution takes a video clip and divides into its frames and audio. Then a collection of primary classifiers are applied to each source and their results are combined in a final classifier utilizing machine learning aggregator techniques. The aggregator techniques that have been put to the test were Random Forest and k-Nearest Neighbors which, with an accuracy of 80%, have demonstrated superior performance over primary classifiers on the selected dataset.

We thank the Portuguese Foundation for Science and Technology (FCT) under Project UIDB/50009/2020—LARSyS.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
See, e.g., [51] for the definitions of Accuracy, Precision, Recall, and F1-score.

References

Abdi, J., Al-Hindawi, A., Ng, T., Vizcaychipi, M.P.: Scoping review on the use of socially assistive robot technology in elderly care. BMJ Open 8(2), e018815 (2018). https://doi.org/10.1136/bmjopen-2017-018815
Article Google Scholar
Abdollahi, H., Mahoor, M., Zandie, R., Sewierski, J., Qualls, S.: Artificial emotional intelligence in socially assistive robots for older adults: A pilot study. IEEE Trans. Affect. Comput. (2022). https://doi.org/10.1109/taffc.2022.3143803
Ahmed, F., Bari, A.S.M.H., Gavrilova, M.L.: Emotion recognition from body movement. IEEE Access 8, 11761–11781 (2020). https://doi.org/10.1109/ACCESS.2019.2963113
Article Google Scholar
Ali, G., et al.: Artificial neural network based ensemble approach for multicultural facial expressions analysis. IEEE Access 8, 134950–134963 (2020). https://doi.org/10.1109/ACCESS.2020.3009908
Article Google Scholar
Alonso-Martín, F., Malfaz, M., Sequeira, J., Gorostiza, J.F., Salichs, M.A.: A multimodal emotion detection system during human-robot interaction. Sensors 13(11), 15549–15581 (2013). https://doi.org/10.3390/s131115549, https://www.mdpi.com/1424-8220/13/11/15549
Ardabili, S., Mosavi, A., Várkonyi-Kóczy, A.R.: Advances in machine learning modeling reviewing hybrid and ensemble methods, pp. 215–227 (2020). https://doi.org/10.1007/978-3-030-36841-8_21
Banerjee, R., De, S., Dey, S.: A survey on various deep learning algorithms for an efficient facial expression recognition system. Int. J. Image Graph. (2021). https://doi.org/10.1142/S0219467822400058
Benamara, N.K., et al.: Real-time facial expression recognition using smoothed deep neural network ensemble. Integrated Comput. Aided Eng. 28(1), 97–111 (2020). https://doi.org/10.3233/ICA-200643
Article Google Scholar
Bhatia, A., Rathee, A.: Multimodal emotion recognition (2020). https://github.com/ankurbhatia24/multimodal-emotion-recognition (Accessed 31 Jan 2023)
Birjali, M., Kasri, M., Beni-Hssane, A.: A comprehensive survey on sentiment analysis: Approaches, challenges and trends. Knowl.-Based Syst. 226, 107134 (2021). https://doi.org/10.1016/j.knosys.2021.107134, https://www.sciencedirect.com/science/article/pii/S095070512100397X
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W.F., Weiss, B., et al.: A database of german emotional speech. In: Interspeech, vol. 5, pp. 1517–1520 (2005)
Google Scholar
Burnwal, S.: Speech emotion recognition (2020). https://www.kaggle.com/code/shivamburnwal/speech-emotion-recognition/notebook (Accessed 31 Jan 2023)
Busso, C., et al.: Iemocap: Interactive emotional dyadic motion capture database. Lang. Resour. Eval. 42, 335–359 (2008)
Article Google Scholar
Canedo, D., Neves, A.: Mood estimation based on facial expressions and postures. In: Proceedings of the RECPAD, pp. 49–50 (2020)
Google Scholar
Cao, H., Cooper, D.G., Keutmann, M.K., Gur, R.C., Nenkova, A., Verma, R.: CREMA-d: Crowd-sourced emotional multimodal actors dataset. IEEE Trans. Affective Comput. 5(4), 377–390 (2014). https://doi.org/10.1109/TAFFC.2014.2336244
Article Google Scholar
Chen, M., He, X., Yang, J., Zhang, H.: 3-d convolutional recurrent neural networks with attention model for speech emotion recognition. IEEE Signal Processing Lett. 25(10), 1440–1444 (2018). https://doi.org/10.1109/LSP.2018.2860246
Article Google Scholar
Cheng, B., Wang, Y., Shao, D., Arora, C., Hoang, T., Liu, X.: Edge4emotion: An edge computing based multi-source emotion recognition platform for human-centric software engineering. In: 2021 IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet Computing (CCGrid), pp. 610–613 (2021). https://doi.org/10.1109/CCGrid51090.2021.00071
Ekman, P.: Facial expressions of emotion: New findings, new questions. Psychol. Sci. 3(1), 34–38 (1992). https://doi.org/10.1111/j.1467-9280.1992.tb00253.x
Article MathSciNet Google Scholar
Ekman, P., Friesen, W.V.: Constants across cultures in the face and emotion. J. Pers. Soc. Psychol. 17(2), 124–129 (1971). https://doi.org/10.1037/h0030377
Article Google Scholar
Filntisis, P.P., Efthymiou, N., Koutras, P., Potamianos, G., Maragos, P.: Fusing body posture with facial expressions for joint recognition of affect in child-robot interaction. IEEE Robotics Automa. Lett. 4(4), 4011–4018 (2019). https://doi.org/10.1109/LRA.2019.2930434
Article Google Scholar
Getson, C., Nejat, G.: Socially assistive robots helping older adults through the pandemic and life after COVID-19. Robotics 10(3), 106 (2021). https://doi.org/10.3390/robotics10030106
Article Google Scholar
Goodfellow, I.J., et al.: Challenges in representation learning: a report on three machine learning contests. In: Lee, M., Hirose, A., Hou, Z.-G., Kil, R.M. (eds.) ICONIP 2013. LNCS, vol. 8228, pp. 117–124. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-42051-1_16
Chapter Google Scholar
Haq, S., Jackson, P.: Machine Audition: Principles, Algorithms and Systems, chap. Multimodal Emotion Recognition, pp. 398–423. IGI Global, Hershey PA (Aug 2010)
Google Scholar
Heredia, J., et al.: Adaptive multimodal emotion detection architecture for social robots. IEEE Access 10, 20727–20744 (2022). https://doi.org/10.1109/ACCESS.2022.3149214
Article Google Scholar
Jolly, E., Cheong, J.H., Xie, T., Byrne, S., Kenny, M., Chang, L.J.: Py-feat: Python facial expression analysis toolbox. arXiv preprint arXiv:2104.03509 (2021)
Jolly, E., Cheong, J.H., Xie, T., Byrne, S., Kenny, M., Chang, L.J.: Py-feat: Python facial expression analysis toolbox (2021). https://doi.org/10.48550/arXiv.2104.03509
Jolly, E., Cheong, J.H., Xie, T., Byrne, S., Kenny, M., Chang, L.J.: Py-feat: Python facial expression analysis toolbox (2023). https://pythonrepo.com/repo/cosanlab-py-feat-python-deep-learning (Accessed 31 Jan 2023)
Kleinsmith, A., Bianchi-Berthouze, N.: Affective body expression perception and recognition: A survey. IEEE Trans. Affective Comput. 4(1), 15–33 (2013). https://doi.org/10.1109/T-AFFC.2012.16
Article Google Scholar
Kumaran, U., Radha Rammohan, S., Nagarajan, S.M., Prathik, A.: Fusion of mel and gammatone frequency cepstral coefficients for speech emotion recognition using deep C-RNN. Int. J. Speech Technol. 24(2), 303–314 (2021). https://doi.org/10.1007/s10772-020-09792-x
Article Google Scholar
Li, S., Deng, W.: Reliable crowdsourcing and deep locality-preserving learning for unconstrained facial expression recognition. IEEE Trans. Image Process. 28(1), 356–370 (2019)
Article MathSciNet MATH Google Scholar
Liang, G., Wang, S., Wang, C.: Pose-aware adversarial domain adaptation for personalized facial expression recognition. arXiv preprint arXiv:2007.05932 (2020)
Livingstone, S.R., Russo, F.A.: The ryerson audio-visual database of emotional speech and song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in north american english. PLoS ONE 13(5), e0196391 (2018). https://doi.org/10.1371/journal.pone.0196391
Article Google Scholar
Noroozi, F., Corneanu, C.A., Kamińska, D., Sapiński, T., Escalera, S., Anbarjafari, G.: Survey on emotional body gesture recognition. IEEE Trans. Affect. Comput. 12(2), 505–523 (2018)
Article Google Scholar
Novais, R., Cardoso, P.J.S., Rodrigues, J.M.F.: Emotion classification from speech by an ensemble strategy. In: ACM 10th International Conference on Software Development and Technologies for Enhancing Accessibility and Fighting Info-exclusion (DSAI 2022) (2022)
Google Scholar
Novais, R., Cardoso, P.J.S., Rodrigues, J.M.F.: Facial emotions classification supported in an ensemble strategy, pp. 477–488 (2022). https://doi.org/10.1007/978-3-031-05028-2_32
Ortega, J.D.S., Cardinal, P., Koerich, A.L.: Emotion recognition using fusion of audio and video features. In: 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC), pp. 3847–3852 (2019). https://doi.org/10.1109/SMC.2019.8914655
Palanisamy, K., Singhania, D., Yao, A.: Rethinking cnn models for audio classification (2020). https://doi.org/10.48550/arXiv.2007.11154
Pecoraro, R., Basile, V., Bono, V.: Local multi-head channel self-attention for facial expression recognition. Information 13(9), 419 (2022)
Article Google Scholar
Pecoraro, R., Basile, V., Bono, V., Gallo, S.: Lhc-net: Local multi-head channel self-attention (code). https://github.com/bodhis4ttva/lhc_net (Accessed 29 Jan 2023)
Pedregosa, F., et al.: Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Pichora-Fuller, M.K., Dupuis, K.: Toronto emotional speech set (tess) (2020). https://doi.org/10.5683/SP2/E8H2MF
de Pinto, M.G.: Audio emotion classification from multiple datasets (2020). https://github.com/marcogdepinto/emotion-classification-from-audio-files (Accessed 31 Jan 2023)
de Pinto, M.G., Polignano, M., Lops, P., Semeraro, G.: Emotions understanding model from spoken language using deep neural networks and mel-frequency cepstral coefficients (May 2020). https://doi.org/10.1109/EAIS48028.2020.9122698
Popova, A.S., Rassadin, A.G., Ponomarenko, A.A.: Emotion recognition in sound, pp. 117–124 (Aug 2017). https://doi.org/10.1007/978-3-319-66604-4_18
Poria, S., Hazarika, D., Majumder, N., Mihalcea, R.: Beneath the tip of the iceberg: Current challenges and new directions in sentiment analysis research. IEEE Trans. Affective Comput. (2020)
Google Scholar
Poria, S., Hazarika, D., Majumder, N., Naik, G., Cambria, E., Mihalcea, R.: MELD: A multimodal multi-party dataset for emotion recognition in conversations. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 527–536 (2019)
Google Scholar
Revina, I., Emmanuel, W.S.: A survey on human face expression recognition techniques. J. King Saud Univ. - Comput. Inform. Sci. 33(6), 619–628 (2021). https://doi.org/10.1016/j.jksuci.2018.09.002
Article Google Scholar
Seknedy, M.E., Fawzi, S.: Speech emotion recognition system for human interaction applications (Dec 2021). https://doi.org/10.1109/ICICIS52592.2021.9694246
Shenk, J., CG, A., Arriaga, O., Owlwasrowk: justinshenk/fer: Zenodo (Sep 2021). https://doi.org/10.5281/zenodo.5362356
Siddiqui, M.F.H., Javaid, A.Y.: A multimodal facial emotion recognition framework through the fusion of speech with visible and infrared images. Multimodal Technol. Interact. 4(3), 46 (2020). https://doi.org/10.3390/mti4030046
Article Google Scholar
Sokolova, M., Lapalme, G.: A systematic analysis of performance measures for classification tasks. Inform. Process. Manage. 45(4), 427–437 (2009)
Article Google Scholar
Sorrentino, A., Mancioppi, G., Coviello, L., Cavallo, F., Fiorini, L.: Feasibility study on the role of personality, emotion, and engagement in socially assistive robotics: A cognitive assessment scenario. Informatics 8(2), 23 (2021). https://doi.org/10.3390/informatics8020023
Article Google Scholar
Stock-Homburg, R.: Survey of emotions in human–robot interactions: perspectives from robotic psychology on 20 years of research. Int. J. Soc. Robot. 14(2), 389–411 (2021). https://doi.org/10.1007/s12369-021-00778-6
Article Google Scholar
Wang, Z., Zeng, F., Liu, S., Zeng, B.: OAENet: Oriented attention ensemble for accurate facial expression recognition. Pattern Recogn. 112, 107694 (2021). https://doi.org/10.1016/j.patcog.2020.107694
Article Google Scholar
Zavaschi, T.H.H., Koerich, A.L., Oliveira, L.E.S.: Facial expression recognition using ensemble of classifiers (May 2011). https://doi.org/10.1109/ICASSP.2011.5946775
Zhang, F., Zhang, T., Mao, Q., Xu, C.: Joint pose and expression modeling for facial expression recognition (Jun 2018). https://doi.org/10.1109/CVPR.2018.00354

Download references

Author information

Authors and Affiliations

LARSyS – Laboratory for Robotics and Engineering Systems, ISR-Lisbon, 1049-001, Lisboa, Portugal
Pedro J. S. Cardoso & João M. F. Rodrigues
Instituto Superior de Engenharia, Universidade do Algarve, 8005-129, Faro, Portugal
Pedro J. S. Cardoso, João M. F. Rodrigues & Rui Novais

Authors

Pedro J. S. Cardoso
View author publications
You can also search for this author in PubMed Google Scholar
João M. F. Rodrigues
View author publications
You can also search for this author in PubMed Google Scholar
Rui Novais
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pedro J. S. Cardoso .

Editor information

Editors and Affiliations

Czech Technical University in Prague, Prague, Czech Republic
Jiří Mikyška
University of Amsterdam, Amsterdam, The Netherlands
Clélia de Mulatier
AGH University of Science and Technology, Krakow, Poland
Maciej Paszynski
University of Amsterdam, Amsterdam, The Netherlands
Valeria V. Krzhizhanovskaya
University of Tennessee at Knoxville, Knoxville, TN, USA
Jack J. Dongarra
University of Amsterdam, Amsterdam, The Netherlands
Peter M.A. Sloot

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cardoso, P.J.S., Rodrigues, J.M.F., Novais, R. (2023). Multimodal Emotion Classification Supported in the Aggregation of Pre-trained Classification Models. In: Mikyška, J., de Mulatier, C., Paszynski, M., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M. (eds) Computational Science – ICCS 2023. ICCS 2023. Lecture Notes in Computer Science, vol 10477. Springer, Cham. https://doi.org/10.1007/978-3-031-36030-5_35

Download citation

DOI: https://doi.org/10.1007/978-3-031-36030-5_35
Published: 26 June 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-36029-9
Online ISBN: 978-3-031-36030-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Multimodal Emotion Classification Supported in the Aggregation of Pre-trained Classification Models