Abstract
For the majority of people, speaking is a natural method to interact with others, and speech is a tool for understanding others. Speech Utterances are statements made by the Speaker in a discussion that transmit knowledge or a message. Speech processing is the study of how to analyze and process human speech signals or voice samples. The speech utterances include the speaker’s individual qualities, including feelings, bodily functions, and information about the environment. In this paper we have adopted the Constant-Q cepstral-coefficients (CQCC) and relative phase, from the relative phase feature further we derive the Linear-prediction residual—Relative phase (LRP-RP), LPA estimated speech based RP (LPAES-RP). Additionally, to improve system performance, the number of features from the LPAES-RP and the CQCC were integrated. Along with these features deep feed-forward neural networks were used to train the model and softmax layers is used to recognize the given speech emotions. The accuracy level of 92.78% is obtained when the LPAES-RP and CQCC features are combined, out of all the specified systems.
Similar content being viewed by others
Data Availability
The dataset produced and scrutinized in this study are accessible from the corresponding author upon reasonable request.
References
Ali H, Hariharan M, Yaacob S, Adom AH. Facial emotion recognition using empirical mode decomposition. Expert Syst Appl. 2015;42(3):1261–77.
Liu ZT, Wu M, Cao WH, MaoJW XuJP, Tan GZ. Speech emotion recognition based on feature selection and extreme learning machine decision tree. Neurocomputing. 2018;273:271–80.
Ragot M, Martin N, Em S, Pallamin N, Diverrez JM. Emotion recognition using physiological signals: Laboratory vs. wearable sensors. In: International conference on applied human factors and ergonomics. Springer; 2017. pp. 15–22
Surabhi V, Saurabh M. Speech emotion recognition: a review. Int Res J Eng Technol (IRJET). 2016;03:313–6.
Wu S, Falk TH, Chan WY. Automatic speech emotion recognition using modulation spectral features. Speech Commun. 2011;53:768–85.
Wu S. Recognition of human emotion in speech using modulation spectral features and support vector machines [PhD thesis]. 2009
Tang J, Alelyani S, Liu H. Feature selection for classification: a review. Data Classification: Algorithms and Applications. 2014:37
Martin V, Robert V. Recognition of emotions in German speech using Gaussian mixture models. LNAI. 2009;5398:256–63.
Milton A, Sharmy Roy S, Tamil Selvi S. SVM scheme for speech emotion recognition using MFCC feature. Int J Comput Appl. 2013;69(9).
Milton A, Sharmy Roy S, Tamil Selvi S. SVM scheme for speech emotion recognition using MFCC feature. Int J Comput Appl. 2013;69:34–9.
Sree GD, Chandrasekhar P, Venkatesshulu B. SVM based speech emotion recognition compared with GMM-UBM and NN. Int J Eng Sci. 2016;3293
Melki G, Kecman V, Ventura S, Cano A. OLLAWV: online learning algorithm using worst-violators. Appl Soft Comput. 2018;66:384–93.
Pan Y, Shen P, Shen L. Speech emotion recognition using support vector machine. Int J Smart Home. 2012;6:101–8.
Peipei S, Zhou C, Xiong C. Automatic speech emotion recognition using support vector machine. IEEE. 2011;2:621–5.
Sathit P. Improvement of speech emotion recognition with neural network classifier by using speech spectrogram. In: International conference on systems, signals and image processing (IWSSIP), 2015. pp. 73–76
Alex G, Navdeep J. Towards end-to end speech recognition with recurrent neural networks. In: International conference on machine learning, vol. 32. 2014.
Chen S, Jin Q. Multi-modal dimensional emotion recognition using recurrent neural networks. Australia: Brisbane; 2015
Lim W, Jang D, Lee T. Speech emotion recognition using convolutional and recurrent neural networks. Asia Pacific. 2017. pp. 1–4
Sara M, Saeed S, Rabiee A. Speech emotion recognition based on a modified brain emotional learning model. Biol Inspir Cogn Archit. 2017;19:32–8.
Yu G, Eric P, Hai-Xiang L, van den Herik J. Speech emotion recognition using voiced segment selection algorithm. ECAI. 2016;285:1682–3.
Brown JC. Calculation of a constant Q spectral transform. J Acoust Soc Am. 1991;89(1):425–34.
Makhoul J. Linear prediction: a tutorial review. Proc IEEE. 1975;63(4):561–80.
Prasanna SRM, Gupta CS, Yegnanarayana B. "Extraction of speaker-specific excitation information from linear prediction residual of speech. Speech Commun. 2006;48(10):1243–61.
Phapatanaburi K, et al. Replay attack detection using linear prediction analysis-based relative phase features. IEEE Access. 2019;7:183614–25.
Wang L, et al. Relative phase information for detecting human speech and spoofed speech. In: Sixteenth annual conference of the international speech communication association, 2015.
Wang L, et al. Spoofing speech detection using modified relative phase information. IEEE J Select Top Signal Process. 2017;11(4):660–70.
Sahidullah M, Kinnunen T, Hanilçi C. A comparison of features for synthetic speech detection. 2015.
Li D, et al. Multiple phase information combination for replay attacks detection. In: INTERSPEECH, 2018.
Livingstone SR, Russo FA. The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): a dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 2018;13(5):e0196391. https://doi.org/10.1371/journal.pone.0196391
Mohan M, Dhanalakshmi P, Satheesh Kumar R. Speech emotion classification using ensemble models with MFCC. Procedia Comput Sci. 2023;218:1857–68.
Singh J, Saheer LB, Faust O. Speech emotion recognition using attention model. Int J Environ Res Public Health. 2023;20(6):5140.
Aggarwal A, et al. Two-way feature extraction for speech emotion recognition using deep learning. Sensors. 2022;22(6):2378.
Acknowledgements
The authors acknowledged the REVA University, Bangalore, Karnataka, India for supporting the research work by providing the facilities.
Funding
No funding received for this research.
Author information
Authors and Affiliations
Contributions
This research was a collective effort, made possible through the collaboration and contributions of all authors involved.
Corresponding author
Ethics declarations
Conflict of interest
No conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the topical collection “Advances in Computational Approaches for Image Processing, Wireless Networks, Cloud Applications and Network Security” guest edited by P. Raviraj, Maode Ma and Roopashree H R.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Shankar, D.R., Manjula, R.B. & Biradar, R.C. Speech Emotion Recognition Using Magnitude and Phase Features. SN COMPUT. SCI. 5, 532 (2024). https://doi.org/10.1007/s42979-024-02833-1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-024-02833-1