ABSTRACT
Humans are prepared to comprehend each other's emotions through subtle body movements and speech expressions, and from those, they change the way they deliver/understand messages when communicating between them. Socially assistive robots need to empower their ability in recognizing emotions in a way to change the interaction with humans, especially with elders. This paper presents a framework for speech emotion prediction supported by an ensemble of distinct out-of-the-box methods, being the main contribution of the integration of the outputs of those methods in a single prediction consistent with the expression presented by the system's user. Results show a classification accuracy of 75.56% over the RAVDESS dataset and 86.43% in a group of datasets constituted by RAVDESS, SAVEE, and TESS.
- P. Ekman. 1992. Facial expressions of emotion: New findings new questions, Psychol. Sci., vol. 3, no. 1, pp. 34-38Google ScholarCross Ref
- H. Kaur and V. Mangat. 2017. A survey of sentiment analysis techniques. In Procs 2017 International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud), pp. 921-925.Google Scholar
- J. Abdi, A. Al-Hindawi , T. Ng, and M. P. Vizcaychipi. 2018. Scoping review on the use of socially assistive robot technology in elderly care. BMJ open, 8(2), e018815.Google Scholar
- M. Kyrarini, F. Lygerakis, A. Rajavenkatanarayanan, C. Sevastopoulos, H. R. Nambiappan, ... and F. Makedon, F. 2021. A survey of robots in healthcare. Technologies, 9(1), 8.Google ScholarCross Ref
- C. Getson and G. Nejat. 2021. Socially Assistive Robots Helping Older Adults through the Pandemic and Life after COVID-19. Robotics, 10(3), 106.Google ScholarCross Ref
- J. Li, Z. Lin, P. Fu, Q. Si and W. Wang. 2020. A hierarchical transformer with speaker modeling for emotion recognition in conversation. arXiv preprint arXiv:2012.14781.Google Scholar
- H. Abdollahi, M. Mahoor, R. Zandie, J. Sewierski and S. Qualls. 2022. Artificial emotional intelligence in socially assistive robots for older adults: A pilot study. IEEE Transactions on Affective Computing, doi: 10.1109/TAFFC.2022.3143803.Google ScholarDigital Library
- A. Sorrentino, G. Mancioppi, L. Coviello, F. Cavallo and L. Fiorini. 2021. Feasibility Study on the Role of Personality, Emotion, and Engagement in Socially Assistive Robotics: A Cognitive Assessment Scenario. Informatics 8, no. 2: 23. doi: 10.3390/informatics8020023Google ScholarCross Ref
- R. Novais, P.J.S. Cardoso and J.M.F. Rodrigues. 2022. Facial Emotions Classification Supported in an Ensemble Strategy. Accepted in 16th International Conference on Universal Access in Human-Computer Interaction, 26 June - 1 July 2022 (virtual conference)Google ScholarDigital Library
- S. Ardabili, A. Mosavi and A. R. Várkonyi-Kóczy. 2020. Advances in Machine Learning modeling reviewing hybrid and ensemble methods (pp. 215–227). doi: 10.1007/978-3-030-36841-8_21.Google Scholar
- S. R. Livingstone and F.A. Russo. 2018. The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13(5): e0196391. doi: 10.1371/journal.pone.0196391.Google ScholarCross Ref
- H. Cao, D. G. Cooper, M. K. Keutmann, R. C. Gur, A. Nenkova and R. Verma. 2014. CREMA-D: Crowd-Sourced Emotional Multimodal Actors Dataset. IEEE Transactions on Affective Computing, vol. 5, no. 4, pp. 377-390. doi: 10.1109/TAFFC.2014.2336244.Google ScholarCross Ref
- S. Haq, P.J.B. Jackson, and J.D. Edge. 2008. Audio-Visual Feature Selection and Reduction for Emotion Classification. In Proc. Int'l Conf. on Auditory-Visual Speech Processing, pp. 185-190, 2008.Google Scholar
- K. Dupuis and M. K. Pichora-Fuller. 2010. Toronto emotional speech set (TESS). doi: 10.5683/SP2/E8H2MFGoogle Scholar
- A. S. Popova, A. G. Rassadin, and A. A. Ponomarenko. 2017. Emotion recognition in sound. In International Conference on Neuroinformatics, pp. 117-124. Springer, Cham.Google Scholar
- M. Chen, X. He, J. Yang and Han Zhang. 2018 3-D Convolutional Recurrent Neural Networks With Attention Model for Speech Emotion Recognition. IEEE Signal Processing Letters, vol. 25, no. 10, pp. 1440-1444.Google ScholarCross Ref
- K. Palanisamy, D. Singhania and A. Yao. 2020. Rethinking CNN models for audio classification. arXiv preprint arXiv:2007.11154.Google Scholar
- M. G. de Pinto, M. Polignano, P. Lops and G. Semeraro. 2020. Emotions Understanding Model from Spoken Language using Deep Neural Networks and Mel-Frequency Cepstral Coefficients. IEEE Conference on Evolving and Adaptive Intelligent Systems (EAIS), pp. 1-5. doi: 10.1109/EAIS48028.2020.9122698Google Scholar
- M. El Seknedy and S. Fawzi. 2021. Speech Emotion Recognition System for Human Interaction Applications. In 10th International Conference on Intelligent Computing and Information Systems (ICICIS), pp. 361-368. IEEE.Google Scholar
- U. Kumaran, S. Radha Rammohan, S. M. Nagarajan and A. Prathik. 2021. Fusion of mel and gammatone frequency cepstral coefficients for speech emotion recognition using deep C-RNN. International Journal of Speech Technology, 24(2), 303-314. doi: 10.1007/s10772-020-09792-xGoogle ScholarDigital Library
- B. J. Abbaschian, D. Sierra-Sosa, and A. Elmaghraby. 2021. Deep learning techniques for speech emotion recognition, from databases to models. Sensors, 21(4), 1249. doi: 10.3390/s21041249Google ScholarCross Ref
- E. Lieskovská, M. Jakubec, R. Jarina and M. Chmulík. 2021. A review on speech emotion recognition using deep learning and attention mechanism. Electronics, 10(10), 1163. doi: 10.3390/electronics10101163Google ScholarCross Ref
- S. Prasanth, M. R. Thanka, E. B. Edwin and V. Nagaraj. 2021. Speech emotion recognition based on machine learning tactics and algorithms. Materials Today: Proceedings. doi: 10.1016/j.matpr.2020.12.207Google Scholar
- M. G. de Pinto. 2020. Audio Emotion Classification from Multiple Datasets, https://github.com/marcogdepinto/emotion-classification-from-audio-files, accessed 2022/05/02Google Scholar
- S. Burnwal. 2020. Speech Emotion Recognition, https://www.kaggle.com/code/shivamburnwal/speech-emotion-recognition/notebook. accessed 2022/05/02Google Scholar
- L. Breiman. 2001. Random forests. Machine learning, 45(1), 5-32.Google Scholar
- T. Hastie, S. Rosset, J. Zhu and H. Zou. 2009. Multi-class Adaboost. Statistics and its Interface, 2(3), 349-360.Google Scholar
- V. K. Ayyadevara. 2018. Pro machine learning algorithms. Berkeley: Apress.Google Scholar
- B. McFee, C. Raffel, D. Liang, D. P .W. Ellis, M- McVicar, E- Battenberg and O- Nieto. 2015. Librosa: Audio and music signal analysis in python. In Proceedings of the 14th Python in Science Conference, pp. 18-25.Google ScholarCross Ref
- F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, ... and E. Duchesnay. 2011. Scikit-learn: Machine learning in Python. the Journal of Machine Learning research, 12, 2825-2830.Google Scholar
Recommendations
Speech Emotion Classification using Ensemble Models with MFCC
AbstractSpeech is one of the most promising features that reflects the underlying emotion of a human being. There are some measurable parameters in speech signals that reveal a persons affective state. Speech Emotion Recognition (SER) is a process of ...
Synthesized speech for model training in cross-corpus recognition of human emotion
Recognizing speakers in emotional conditions remains a challenging issue, since speaker states such as emotion affect the acoustic parameters used in typical speaker recognition systems. Thus, it is believed that knowledge of the current speaker emotion ...
Survey on speech emotion recognition: Features, classification schemes, and databases
Recently, increasing attention has been directed to the study of the emotional content of speech signals, and hence, many systems have been proposed to identify the emotional content of a spoken utterance. This paper is a survey of speech emotion ...
Comments