Speech Emotion Recognition Using Magnitude and Phase Features

Shankar, D. Ravi; Manjula, R. B.; Biradar, Rajashekhar C.

doi:10.1007/s42979-024-02833-1

Speech Emotion Recognition Using Magnitude and Phase Features

Original Research
Published: 09 May 2024

Volume 5, article number 532, (2024)
Cite this article

SN Computer Science Aims and scope Submit manuscript

D. Ravi Shankar¹,
R. B. Manjula¹ &
Rajashekhar C. Biradar²

32 Accesses
Explore all metrics

Abstract

For the majority of people, speaking is a natural method to interact with others, and speech is a tool for understanding others. Speech Utterances are statements made by the Speaker in a discussion that transmit knowledge or a message. Speech processing is the study of how to analyze and process human speech signals or voice samples. The speech utterances include the speaker’s individual qualities, including feelings, bodily functions, and information about the environment. In this paper we have adopted the Constant-Q cepstral-coefficients (CQCC) and relative phase, from the relative phase feature further we derive the Linear-prediction residual—Relative phase (LRP-RP), LPA estimated speech based RP (LPAES-RP). Additionally, to improve system performance, the number of features from the LPAES-RP and the CQCC were integrated. Along with these features deep feed-forward neural networks were used to train the model and softmax layers is used to recognize the given speech emotions. The accuracy level of 92.78% is obtained when the LPAES-RP and CQCC features are combined, out of all the specified systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Linear Frequency Residual Cepstral Coefficients for Speech Emotion Recognition

Modified Group Delay Features for Emotion Recognition

Speech Emotion Recognition Using Deep Learning

Data Availability

The dataset produced and scrutinized in this study are accessible from the corresponding author upon reasonable request.

References

Ali H, Hariharan M, Yaacob S, Adom AH. Facial emotion recognition using empirical mode decomposition. Expert Syst Appl. 2015;42(3):1261–77.
Article Google Scholar
Liu ZT, Wu M, Cao WH, MaoJW XuJP, Tan GZ. Speech emotion recognition based on feature selection and extreme learning machine decision tree. Neurocomputing. 2018;273:271–80.
Article Google Scholar
Ragot M, Martin N, Em S, Pallamin N, Diverrez JM. Emotion recognition using physiological signals: Laboratory vs. wearable sensors. In: International conference on applied human factors and ergonomics. Springer; 2017. pp. 15–22
Surabhi V, Saurabh M. Speech emotion recognition: a review. Int Res J Eng Technol (IRJET). 2016;03:313–6.
Google Scholar
Wu S, Falk TH, Chan WY. Automatic speech emotion recognition using modulation spectral features. Speech Commun. 2011;53:768–85.
Article Google Scholar
Wu S. Recognition of human emotion in speech using modulation spectral features and support vector machines [PhD thesis]. 2009
Tang J, Alelyani S, Liu H. Feature selection for classification: a review. Data Classification: Algorithms and Applications. 2014:37
Martin V, Robert V. Recognition of emotions in German speech using Gaussian mixture models. LNAI. 2009;5398:256–63.
Google Scholar
Milton A, Sharmy Roy S, Tamil Selvi S. SVM scheme for speech emotion recognition using MFCC feature. Int J Comput Appl. 2013;69(9).
Google Scholar
Milton A, Sharmy Roy S, Tamil Selvi S. SVM scheme for speech emotion recognition using MFCC feature. Int J Comput Appl. 2013;69:34–9.
Google Scholar
Sree GD, Chandrasekhar P, Venkatesshulu B. SVM based speech emotion recognition compared with GMM-UBM and NN. Int J Eng Sci. 2016;3293
Google Scholar
Melki G, Kecman V, Ventura S, Cano A. OLLAWV: online learning algorithm using worst-violators. Appl Soft Comput. 2018;66:384–93.
Article Google Scholar
Pan Y, Shen P, Shen L. Speech emotion recognition using support vector machine. Int J Smart Home. 2012;6:101–8.
Google Scholar
Peipei S, Zhou C, Xiong C. Automatic speech emotion recognition using support vector machine. IEEE. 2011;2:621–5.
Google Scholar
Sathit P. Improvement of speech emotion recognition with neural network classifier by using speech spectrogram. In: International conference on systems, signals and image processing (IWSSIP), 2015. pp. 73–76
Alex G, Navdeep J. Towards end-to end speech recognition with recurrent neural networks. In: International conference on machine learning, vol. 32. 2014.
Chen S, Jin Q. Multi-modal dimensional emotion recognition using recurrent neural networks. Australia: Brisbane; 2015
Lim W, Jang D, Lee T. Speech emotion recognition using convolutional and recurrent neural networks. Asia Pacific. 2017. pp. 1–4
Sara M, Saeed S, Rabiee A. Speech emotion recognition based on a modified brain emotional learning model. Biol Inspir Cogn Archit. 2017;19:32–8.
Google Scholar
Yu G, Eric P, Hai-Xiang L, van den Herik J. Speech emotion recognition using voiced segment selection algorithm. ECAI. 2016;285:1682–3.
Google Scholar
Brown JC. Calculation of a constant Q spectral transform. J Acoust Soc Am. 1991;89(1):425–34.
Article Google Scholar
Makhoul J. Linear prediction: a tutorial review. Proc IEEE. 1975;63(4):561–80.
Article Google Scholar
Prasanna SRM, Gupta CS, Yegnanarayana B. "Extraction of speaker-specific excitation information from linear prediction residual of speech. Speech Commun. 2006;48(10):1243–61.
Article Google Scholar
Phapatanaburi K, et al. Replay attack detection using linear prediction analysis-based relative phase features. IEEE Access. 2019;7:183614–25.
Article Google Scholar
Wang L, et al. Relative phase information for detecting human speech and spoofed speech. In: Sixteenth annual conference of the international speech communication association, 2015.
Wang L, et al. Spoofing speech detection using modified relative phase information. IEEE J Select Top Signal Process. 2017;11(4):660–70.
Article Google Scholar
Sahidullah M, Kinnunen T, Hanilçi C. A comparison of features for synthetic speech detection. 2015.
Li D, et al. Multiple phase information combination for replay attacks detection. In: INTERSPEECH, 2018.
Livingstone SR, Russo FA. The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): a dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 2018;13(5):e0196391. https://doi.org/10.1371/journal.pone.0196391
Article Google Scholar
Mohan M, Dhanalakshmi P, Satheesh Kumar R. Speech emotion classification using ensemble models with MFCC. Procedia Comput Sci. 2023;218:1857–68.
Article Google Scholar
Singh J, Saheer LB, Faust O. Speech emotion recognition using attention model. Int J Environ Res Public Health. 2023;20(6):5140.
Article Google Scholar
Aggarwal A, et al. Two-way feature extraction for speech emotion recognition using deep learning. Sensors. 2022;22(6):2378.
Article Google Scholar

Download references

Acknowledgements

The authors acknowledged the REVA University, Bangalore, Karnataka, India for supporting the research work by providing the facilities.

Funding

No funding received for this research.

Author information

Authors and Affiliations

School of Electronics and Communication Engineering, REVA University, Bangalore, Karnataka, India
D. Ravi Shankar & R. B. Manjula
REVA University, Bangalore, Karnataka, India
Rajashekhar C. Biradar

Authors

D. Ravi Shankar
View author publications
You can also search for this author in PubMed Google Scholar
R. B. Manjula
View author publications
You can also search for this author in PubMed Google Scholar
Rajashekhar C. Biradar
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

This research was a collective effort, made possible through the collaboration and contributions of all authors involved.

Corresponding author

Correspondence to D. Ravi Shankar.

Ethics declarations

Conflict of interest

No conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Advances in Computational Approaches for Image Processing, Wireless Networks, Cloud Applications and Network Security” guest edited by P. Raviraj, Maode Ma and Roopashree H R.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Shankar, D.R., Manjula, R.B. & Biradar, R.C. Speech Emotion Recognition Using Magnitude and Phase Features. SN COMPUT. SCI. 5, 532 (2024). https://doi.org/10.1007/s42979-024-02833-1

Download citation

Received: 02 March 2024
Accepted: 26 March 2024
Published: 09 May 2024
DOI: https://doi.org/10.1007/s42979-024-02833-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Speech Emotion Recognition Using Magnitude and Phase Features

Abstract

Access this article

Similar content being viewed by others

Linear Frequency Residual Cepstral Coefficients for Speech Emotion Recognition

Modified Group Delay Features for Emotion Recognition

Speech Emotion Recognition Using Deep Learning

Data Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Speech Emotion Recognition Using Magnitude and Phase Features

Abstract

Access this article

Similar content being viewed by others

Linear Frequency Residual Cepstral Coefficients for Speech Emotion Recognition

Modified Group Delay Features for Emotion Recognition

Speech Emotion Recognition Using Deep Learning

Data Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation