Abstract
Human-centric artificial intelligence struggles to build automated procedures that recognize emotions which can be integrated in artificial systems, such as user interfaces or social robots. In this context, this paper researches on building an Emotion Multi-modal Aggregator (EMmA) that will rely on a collection of open-source single source emotion classification methods aggregated to produce an emotion prediction. Although extendable, tested solution takes a video clip and divides into its frames and audio. Then a collection of primary classifiers are applied to each source and their results are combined in a final classifier utilizing machine learning aggregator techniques. The aggregator techniques that have been put to the test were Random Forest and k-Nearest Neighbors which, with an accuracy of 80%, have demonstrated superior performance over primary classifiers on the selected dataset.
We thank the Portuguese Foundation for Science and Technology (FCT) under Project UIDB/50009/2020—LARSyS.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
See, e.g., [51] for the definitions of Accuracy, Precision, Recall, and F1-score.
References
Abdi, J., Al-Hindawi, A., Ng, T., Vizcaychipi, M.P.: Scoping review on the use of socially assistive robot technology in elderly care. BMJ Open 8(2), e018815 (2018). https://doi.org/10.1136/bmjopen-2017-018815
Abdollahi, H., Mahoor, M., Zandie, R., Sewierski, J., Qualls, S.: Artificial emotional intelligence in socially assistive robots for older adults: A pilot study. IEEE Trans. Affect. Comput. (2022). https://doi.org/10.1109/taffc.2022.3143803
Ahmed, F., Bari, A.S.M.H., Gavrilova, M.L.: Emotion recognition from body movement. IEEE Access 8, 11761–11781 (2020). https://doi.org/10.1109/ACCESS.2019.2963113
Ali, G., et al.: Artificial neural network based ensemble approach for multicultural facial expressions analysis. IEEE Access 8, 134950–134963 (2020). https://doi.org/10.1109/ACCESS.2020.3009908
Alonso-Martín, F., Malfaz, M., Sequeira, J., Gorostiza, J.F., Salichs, M.A.: A multimodal emotion detection system during human-robot interaction. Sensors 13(11), 15549–15581 (2013). https://doi.org/10.3390/s131115549, https://www.mdpi.com/1424-8220/13/11/15549
Ardabili, S., Mosavi, A., Várkonyi-Kóczy, A.R.: Advances in machine learning modeling reviewing hybrid and ensemble methods, pp. 215–227 (2020). https://doi.org/10.1007/978-3-030-36841-8_21
Banerjee, R., De, S., Dey, S.: A survey on various deep learning algorithms for an efficient facial expression recognition system. Int. J. Image Graph. (2021). https://doi.org/10.1142/S0219467822400058
Benamara, N.K., et al.: Real-time facial expression recognition using smoothed deep neural network ensemble. Integrated Comput. Aided Eng. 28(1), 97–111 (2020). https://doi.org/10.3233/ICA-200643
Bhatia, A., Rathee, A.: Multimodal emotion recognition (2020). https://github.com/ankurbhatia24/multimodal-emotion-recognition (Accessed 31 Jan 2023)
Birjali, M., Kasri, M., Beni-Hssane, A.: A comprehensive survey on sentiment analysis: Approaches, challenges and trends. Knowl.-Based Syst. 226, 107134 (2021). https://doi.org/10.1016/j.knosys.2021.107134, https://www.sciencedirect.com/science/article/pii/S095070512100397X
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W.F., Weiss, B., et al.: A database of german emotional speech. In: Interspeech, vol. 5, pp. 1517–1520 (2005)
Burnwal, S.: Speech emotion recognition (2020). https://www.kaggle.com/code/shivamburnwal/speech-emotion-recognition/notebook (Accessed 31 Jan 2023)
Busso, C., et al.: Iemocap: Interactive emotional dyadic motion capture database. Lang. Resour. Eval. 42, 335–359 (2008)
Canedo, D., Neves, A.: Mood estimation based on facial expressions and postures. In: Proceedings of the RECPAD, pp. 49–50 (2020)
Cao, H., Cooper, D.G., Keutmann, M.K., Gur, R.C., Nenkova, A., Verma, R.: CREMA-d: Crowd-sourced emotional multimodal actors dataset. IEEE Trans. Affective Comput. 5(4), 377–390 (2014). https://doi.org/10.1109/TAFFC.2014.2336244
Chen, M., He, X., Yang, J., Zhang, H.: 3-d convolutional recurrent neural networks with attention model for speech emotion recognition. IEEE Signal Processing Lett. 25(10), 1440–1444 (2018). https://doi.org/10.1109/LSP.2018.2860246
Cheng, B., Wang, Y., Shao, D., Arora, C., Hoang, T., Liu, X.: Edge4emotion: An edge computing based multi-source emotion recognition platform for human-centric software engineering. In: 2021 IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet Computing (CCGrid), pp. 610–613 (2021). https://doi.org/10.1109/CCGrid51090.2021.00071
Ekman, P.: Facial expressions of emotion: New findings, new questions. Psychol. Sci. 3(1), 34–38 (1992). https://doi.org/10.1111/j.1467-9280.1992.tb00253.x
Ekman, P., Friesen, W.V.: Constants across cultures in the face and emotion. J. Pers. Soc. Psychol. 17(2), 124–129 (1971). https://doi.org/10.1037/h0030377
Filntisis, P.P., Efthymiou, N., Koutras, P., Potamianos, G., Maragos, P.: Fusing body posture with facial expressions for joint recognition of affect in child-robot interaction. IEEE Robotics Automa. Lett. 4(4), 4011–4018 (2019). https://doi.org/10.1109/LRA.2019.2930434
Getson, C., Nejat, G.: Socially assistive robots helping older adults through the pandemic and life after COVID-19. Robotics 10(3), 106 (2021). https://doi.org/10.3390/robotics10030106
Goodfellow, I.J., et al.: Challenges in representation learning: a report on three machine learning contests. In: Lee, M., Hirose, A., Hou, Z.-G., Kil, R.M. (eds.) ICONIP 2013. LNCS, vol. 8228, pp. 117–124. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-42051-1_16
Haq, S., Jackson, P.: Machine Audition: Principles, Algorithms and Systems, chap. Multimodal Emotion Recognition, pp. 398–423. IGI Global, Hershey PA (Aug 2010)
Heredia, J., et al.: Adaptive multimodal emotion detection architecture for social robots. IEEE Access 10, 20727–20744 (2022). https://doi.org/10.1109/ACCESS.2022.3149214
Jolly, E., Cheong, J.H., Xie, T., Byrne, S., Kenny, M., Chang, L.J.: Py-feat: Python facial expression analysis toolbox. arXiv preprint arXiv:2104.03509 (2021)
Jolly, E., Cheong, J.H., Xie, T., Byrne, S., Kenny, M., Chang, L.J.: Py-feat: Python facial expression analysis toolbox (2021). https://doi.org/10.48550/arXiv.2104.03509
Jolly, E., Cheong, J.H., Xie, T., Byrne, S., Kenny, M., Chang, L.J.: Py-feat: Python facial expression analysis toolbox (2023). https://pythonrepo.com/repo/cosanlab-py-feat-python-deep-learning (Accessed 31 Jan 2023)
Kleinsmith, A., Bianchi-Berthouze, N.: Affective body expression perception and recognition: A survey. IEEE Trans. Affective Comput. 4(1), 15–33 (2013). https://doi.org/10.1109/T-AFFC.2012.16
Kumaran, U., Radha Rammohan, S., Nagarajan, S.M., Prathik, A.: Fusion of mel and gammatone frequency cepstral coefficients for speech emotion recognition using deep C-RNN. Int. J. Speech Technol. 24(2), 303–314 (2021). https://doi.org/10.1007/s10772-020-09792-x
Li, S., Deng, W.: Reliable crowdsourcing and deep locality-preserving learning for unconstrained facial expression recognition. IEEE Trans. Image Process. 28(1), 356–370 (2019)
Liang, G., Wang, S., Wang, C.: Pose-aware adversarial domain adaptation for personalized facial expression recognition. arXiv preprint arXiv:2007.05932 (2020)
Livingstone, S.R., Russo, F.A.: The ryerson audio-visual database of emotional speech and song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in north american english. PLoS ONE 13(5), e0196391 (2018). https://doi.org/10.1371/journal.pone.0196391
Noroozi, F., Corneanu, C.A., Kamińska, D., Sapiński, T., Escalera, S., Anbarjafari, G.: Survey on emotional body gesture recognition. IEEE Trans. Affect. Comput. 12(2), 505–523 (2018)
Novais, R., Cardoso, P.J.S., Rodrigues, J.M.F.: Emotion classification from speech by an ensemble strategy. In: ACM 10th International Conference on Software Development and Technologies for Enhancing Accessibility and Fighting Info-exclusion (DSAI 2022) (2022)
Novais, R., Cardoso, P.J.S., Rodrigues, J.M.F.: Facial emotions classification supported in an ensemble strategy, pp. 477–488 (2022). https://doi.org/10.1007/978-3-031-05028-2_32
Ortega, J.D.S., Cardinal, P., Koerich, A.L.: Emotion recognition using fusion of audio and video features. In: 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC), pp. 3847–3852 (2019). https://doi.org/10.1109/SMC.2019.8914655
Palanisamy, K., Singhania, D., Yao, A.: Rethinking cnn models for audio classification (2020). https://doi.org/10.48550/arXiv.2007.11154
Pecoraro, R., Basile, V., Bono, V.: Local multi-head channel self-attention for facial expression recognition. Information 13(9), 419 (2022)
Pecoraro, R., Basile, V., Bono, V., Gallo, S.: Lhc-net: Local multi-head channel self-attention (code). https://github.com/bodhis4ttva/lhc_net (Accessed 29 Jan 2023)
Pedregosa, F., et al.: Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Pichora-Fuller, M.K., Dupuis, K.: Toronto emotional speech set (tess) (2020). https://doi.org/10.5683/SP2/E8H2MF
de Pinto, M.G.: Audio emotion classification from multiple datasets (2020). https://github.com/marcogdepinto/emotion-classification-from-audio-files (Accessed 31 Jan 2023)
de Pinto, M.G., Polignano, M., Lops, P., Semeraro, G.: Emotions understanding model from spoken language using deep neural networks and mel-frequency cepstral coefficients (May 2020). https://doi.org/10.1109/EAIS48028.2020.9122698
Popova, A.S., Rassadin, A.G., Ponomarenko, A.A.: Emotion recognition in sound, pp. 117–124 (Aug 2017). https://doi.org/10.1007/978-3-319-66604-4_18
Poria, S., Hazarika, D., Majumder, N., Mihalcea, R.: Beneath the tip of the iceberg: Current challenges and new directions in sentiment analysis research. IEEE Trans. Affective Comput. (2020)
Poria, S., Hazarika, D., Majumder, N., Naik, G., Cambria, E., Mihalcea, R.: MELD: A multimodal multi-party dataset for emotion recognition in conversations. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 527–536 (2019)
Revina, I., Emmanuel, W.S.: A survey on human face expression recognition techniques. J. King Saud Univ. - Comput. Inform. Sci. 33(6), 619–628 (2021). https://doi.org/10.1016/j.jksuci.2018.09.002
Seknedy, M.E., Fawzi, S.: Speech emotion recognition system for human interaction applications (Dec 2021). https://doi.org/10.1109/ICICIS52592.2021.9694246
Shenk, J., CG, A., Arriaga, O., Owlwasrowk: justinshenk/fer: Zenodo (Sep 2021). https://doi.org/10.5281/zenodo.5362356
Siddiqui, M.F.H., Javaid, A.Y.: A multimodal facial emotion recognition framework through the fusion of speech with visible and infrared images. Multimodal Technol. Interact. 4(3), 46 (2020). https://doi.org/10.3390/mti4030046
Sokolova, M., Lapalme, G.: A systematic analysis of performance measures for classification tasks. Inform. Process. Manage. 45(4), 427–437 (2009)
Sorrentino, A., Mancioppi, G., Coviello, L., Cavallo, F., Fiorini, L.: Feasibility study on the role of personality, emotion, and engagement in socially assistive robotics: A cognitive assessment scenario. Informatics 8(2), 23 (2021). https://doi.org/10.3390/informatics8020023
Stock-Homburg, R.: Survey of emotions in human–robot interactions: perspectives from robotic psychology on 20 years of research. Int. J. Soc. Robot. 14(2), 389–411 (2021). https://doi.org/10.1007/s12369-021-00778-6
Wang, Z., Zeng, F., Liu, S., Zeng, B.: OAENet: Oriented attention ensemble for accurate facial expression recognition. Pattern Recogn. 112, 107694 (2021). https://doi.org/10.1016/j.patcog.2020.107694
Zavaschi, T.H.H., Koerich, A.L., Oliveira, L.E.S.: Facial expression recognition using ensemble of classifiers (May 2011). https://doi.org/10.1109/ICASSP.2011.5946775
Zhang, F., Zhang, T., Mao, Q., Xu, C.: Joint pose and expression modeling for facial expression recognition (Jun 2018). https://doi.org/10.1109/CVPR.2018.00354
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Cardoso, P.J.S., Rodrigues, J.M.F., Novais, R. (2023). Multimodal Emotion Classification Supported in the Aggregation of Pre-trained Classification Models. In: Mikyška, J., de Mulatier, C., Paszynski, M., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M. (eds) Computational Science – ICCS 2023. ICCS 2023. Lecture Notes in Computer Science, vol 10477. Springer, Cham. https://doi.org/10.1007/978-3-031-36030-5_35
Download citation
DOI: https://doi.org/10.1007/978-3-031-36030-5_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-36029-9
Online ISBN: 978-3-031-36030-5
eBook Packages: Computer ScienceComputer Science (R0)