Abstract
Speech emotion recognition is essentially a sequence analysis task. Therefore, deployment of LSTM models is an appropriate benchmark for automatic emotion recognition of speech. This work is an attempt to compare the performance of stacked CNN-LSTM versus stand-alone LSTM architecture for recognition of emotions. The key contribution of this work is exploitation of the stacked CNN-LSTM architecture and augmentation of training data so as to get robust and reliable performance. Results are shown for the RAVDESS database. MFCCs from preprocessed raw audio files are considered as input to the models. Accuracy and other metrics indicate that hybrid CNN-LSTM achieves improved recognition accuracy compared to the stand-alone LSTM architecture. Augmentation of data supports better learning and robustness.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
K. Han, et al., Speech emotion recognition using deep neural network and extreme learning machine, in INTERSPEECH (2014)
J. Lee, I. Tashev, High-level feature representation using recurrent neural network for speech emotion recognition, in INTERSPEECH (2015)
C. Etienne, G. Fidanza, A. Petrovskii, L. Devillers, B. Schmauch, CNN+LSTM architecture for speech emotion recognition with data augmentation, in Workshop on Speech, Music and Mind 2018 (2018). https://doi.org/10.21437/SMM.2018-5
J. Zhao, X. Mao, L. Chen, Speech emotion recognition using deep 1D & 2D CNN LSTM networks. Biomed. Signal Process. Control 47, 312–323 (2019)
S. Livingstone, F. Russo, The Ryerson audio-visual database of emotional speech and song (RAVDESS): a dynamic, multimodal set of facial and vocal expressions in North American English. PLOS ONE 13(5), e0196391 (2018). Available: https://doi.org/10.1371/journal.pone.0196391
C. Etienne, G. Fidanza, A. Petrovskii, L. Devillers, C. Schmauch, CNN+LSTM Architecture for Speech Emotion Recognition with Data Augmentation, arXiv:1802.05630
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Swain, T., Anand, U., Aryan, Y., Khanra, S., Raj, A., Patnaik, S. (2021). Performance Comparison of LSTM Models for SER. In: Sabut, S.K., Ray, A.K., Pati, B., Acharya, U.R. (eds) Proceedings of International Conference on Communication, Circuits, and Systems. Lecture Notes in Electrical Engineering, vol 728. Springer, Singapore. https://doi.org/10.1007/978-981-33-4866-0_52
Download citation
DOI: https://doi.org/10.1007/978-981-33-4866-0_52
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-33-4865-3
Online ISBN: 978-981-33-4866-0
eBook Packages: Computer ScienceComputer Science (R0)