skip to main content
10.1145/3563137.3563170acmotherconferencesArticle/Chapter ViewAbstractPublication PagesdsaiConference Proceedingsconference-collections
research-article
Open Access

Emotion Classification from Speech by an Ensemble Strategy

Published:25 May 2023Publication History

ABSTRACT

Humans are prepared to comprehend each other's emotions through subtle body movements and speech expressions, and from those, they change the way they deliver/understand messages when communicating between them. Socially assistive robots need to empower their ability in recognizing emotions in a way to change the interaction with humans, especially with elders. This paper presents a framework for speech emotion prediction supported by an ensemble of distinct out-of-the-box methods, being the main contribution of the integration of the outputs of those methods in a single prediction consistent with the expression presented by the system's user. Results show a classification accuracy of 75.56% over the RAVDESS dataset and 86.43% in a group of datasets constituted by RAVDESS, SAVEE, and TESS.

References

  1. P. Ekman. 1992. Facial expressions of emotion: New findings new questions, Psychol. Sci., vol. 3, no. 1, pp. 34-38Google ScholarGoogle ScholarCross RefCross Ref
  2. H. Kaur and V. Mangat. 2017. A survey of sentiment analysis techniques. In Procs 2017 International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud), pp. 921-925.Google ScholarGoogle Scholar
  3. J. Abdi, A. Al-Hindawi , T. Ng, and M. P. Vizcaychipi. 2018. Scoping review on the use of socially assistive robot technology in elderly care. BMJ open, 8(2), e018815.Google ScholarGoogle Scholar
  4. M. Kyrarini, F. Lygerakis, A. Rajavenkatanarayanan, C. Sevastopoulos, H. R. Nambiappan, ... and F. Makedon, F. 2021. A survey of robots in healthcare. Technologies, 9(1), 8.Google ScholarGoogle ScholarCross RefCross Ref
  5. C. Getson and G. Nejat. 2021. Socially Assistive Robots Helping Older Adults through the Pandemic and Life after COVID-19. Robotics, 10(3), 106.Google ScholarGoogle ScholarCross RefCross Ref
  6. J. Li, Z. Lin, P. Fu, Q. Si and W. Wang. 2020. A hierarchical transformer with speaker modeling for emotion recognition in conversation. arXiv preprint arXiv:2012.14781.Google ScholarGoogle Scholar
  7. H. Abdollahi, M. Mahoor, R. Zandie, J. Sewierski and S. Qualls. 2022. Artificial emotional intelligence in socially assistive robots for older adults: A pilot study. IEEE Transactions on Affective Computing, doi: 10.1109/TAFFC.2022.3143803.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Sorrentino, G. Mancioppi, L. Coviello, F. Cavallo and L. Fiorini. 2021. Feasibility Study on the Role of Personality, Emotion, and Engagement in Socially Assistive Robotics: A Cognitive Assessment Scenario. Informatics 8, no. 2: 23. doi: 10.3390/informatics8020023Google ScholarGoogle ScholarCross RefCross Ref
  9. R. Novais, P.J.S. Cardoso and J.M.F. Rodrigues. 2022. Facial Emotions Classification Supported in an Ensemble Strategy. Accepted in 16th International Conference on Universal Access in Human-Computer Interaction, 26 June - 1 July 2022 (virtual conference)Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Ardabili, A. Mosavi and A. R. Várkonyi-Kóczy. 2020. Advances in Machine Learning modeling reviewing hybrid and ensemble methods (pp. 215–227). doi: 10.1007/978-3-030-36841-8_21.Google ScholarGoogle Scholar
  11. S. R. Livingstone and F.A. Russo. 2018. The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13(5): e0196391. doi: 10.1371/journal.pone.0196391.Google ScholarGoogle ScholarCross RefCross Ref
  12. H. Cao, D. G. Cooper, M. K. Keutmann, R. C. Gur, A. Nenkova and R. Verma. 2014. CREMA-D: Crowd-Sourced Emotional Multimodal Actors Dataset. IEEE Transactions on Affective Computing, vol. 5, no. 4, pp. 377-390. doi: 10.1109/TAFFC.2014.2336244.Google ScholarGoogle ScholarCross RefCross Ref
  13. S. Haq, P.J.B. Jackson, and J.D. Edge. 2008. Audio-Visual Feature Selection and Reduction for Emotion Classification. In Proc. Int'l Conf. on Auditory-Visual Speech Processing, pp. 185-190, 2008.Google ScholarGoogle Scholar
  14. K. Dupuis and M. K. Pichora-Fuller. 2010. Toronto emotional speech set (TESS). doi: 10.5683/SP2/E8H2MFGoogle ScholarGoogle Scholar
  15. A. S. Popova, A. G. Rassadin, and A. A. Ponomarenko. 2017. Emotion recognition in sound. In International Conference on Neuroinformatics, pp. 117-124. Springer, Cham.Google ScholarGoogle Scholar
  16. M. Chen, X. He, J. Yang and Han Zhang. 2018 3-D Convolutional Recurrent Neural Networks With Attention Model for Speech Emotion Recognition. IEEE Signal Processing Letters, vol. 25, no. 10, pp. 1440-1444.Google ScholarGoogle ScholarCross RefCross Ref
  17. K. Palanisamy, D. Singhania and A. Yao. 2020. Rethinking CNN models for audio classification. arXiv preprint arXiv:2007.11154.Google ScholarGoogle Scholar
  18. M. G. de Pinto, M. Polignano, P. Lops and G. Semeraro. 2020. Emotions Understanding Model from Spoken Language using Deep Neural Networks and Mel-Frequency Cepstral Coefficients. IEEE Conference on Evolving and Adaptive Intelligent Systems (EAIS), pp. 1-5. doi: 10.1109/EAIS48028.2020.9122698Google ScholarGoogle Scholar
  19. M. El Seknedy and S. Fawzi. 2021. Speech Emotion Recognition System for Human Interaction Applications. In 10th International Conference on Intelligent Computing and Information Systems (ICICIS), pp. 361-368. IEEE.Google ScholarGoogle Scholar
  20. U. Kumaran, S. Radha Rammohan, S. M. Nagarajan and A. Prathik. 2021. Fusion of mel and gammatone frequency cepstral coefficients for speech emotion recognition using deep C-RNN. International Journal of Speech Technology, 24(2), 303-314. doi: 10.1007/s10772-020-09792-xGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  21. B. J. Abbaschian, D. Sierra-Sosa, and A. Elmaghraby. 2021. Deep learning techniques for speech emotion recognition, from databases to models. Sensors, 21(4), 1249. doi: 10.3390/s21041249Google ScholarGoogle ScholarCross RefCross Ref
  22. E. Lieskovská, M. Jakubec, R. Jarina and M. Chmulík. 2021. A review on speech emotion recognition using deep learning and attention mechanism. Electronics, 10(10), 1163. doi: 10.3390/electronics10101163Google ScholarGoogle ScholarCross RefCross Ref
  23. S. Prasanth, M. R. Thanka, E. B. Edwin and V. Nagaraj. 2021. Speech emotion recognition based on machine learning tactics and algorithms. Materials Today: Proceedings. doi: 10.1016/j.matpr.2020.12.207Google ScholarGoogle Scholar
  24. M. G. de Pinto. 2020. Audio Emotion Classification from Multiple Datasets, https://github.com/marcogdepinto/emotion-classification-from-audio-files, accessed 2022/05/02Google ScholarGoogle Scholar
  25. S. Burnwal. 2020. Speech Emotion Recognition, https://www.kaggle.com/code/shivamburnwal/speech-emotion-recognition/notebook. accessed 2022/05/02Google ScholarGoogle Scholar
  26. L. Breiman. 2001. Random forests. Machine learning, 45(1), 5-32.Google ScholarGoogle Scholar
  27. T. Hastie, S. Rosset, J. Zhu and H. Zou. 2009. Multi-class Adaboost. Statistics and its Interface, 2(3), 349-360.Google ScholarGoogle Scholar
  28. V. K. Ayyadevara. 2018. Pro machine learning algorithms. Berkeley: Apress.Google ScholarGoogle Scholar
  29. B. McFee, C. Raffel, D. Liang, D. P .W. Ellis, M- McVicar, E- Battenberg and O- Nieto. 2015. Librosa: Audio and music signal analysis in python. In Proceedings of the 14th Python in Science Conference, pp. 18-25.Google ScholarGoogle ScholarCross RefCross Ref
  30. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, ... and E. Duchesnay. 2011. Scikit-learn: Machine learning in Python. the Journal of Machine Learning research, 12, 2825-2830.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    DSAI '22: Proceedings of the 10th International Conference on Software Development and Technologies for Enhancing Accessibility and Fighting Info-exclusion
    August 2022
    237 pages
    ISBN:9781450398077
    DOI:10.1145/3563137

    Copyright © 2022 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 25 May 2023

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    Overall Acceptance Rate17of23submissions,74%
  • Article Metrics

    • Downloads (Last 12 months)194
    • Downloads (Last 6 weeks)42

    Other Metrics

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format