Skip to main content
Log in

Speech Emotion Recognition Using Magnitude and Phase Features

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

For the majority of people, speaking is a natural method to interact with others, and speech is a tool for understanding others. Speech Utterances are statements made by the Speaker in a discussion that transmit knowledge or a message. Speech processing is the study of how to analyze and process human speech signals or voice samples. The speech utterances include the speaker’s individual qualities, including feelings, bodily functions, and information about the environment. In this paper we have adopted the Constant-Q cepstral-coefficients (CQCC) and relative phase, from the relative phase feature further we derive the Linear-prediction residual—Relative phase (LRP-RP), LPA estimated speech based RP (LPAES-RP). Additionally, to improve system performance, the number of features from the LPAES-RP and the CQCC were integrated. Along with these features deep feed-forward neural networks were used to train the model and softmax layers is used to recognize the given speech emotions. The accuracy level of 92.78% is obtained when the LPAES-RP and CQCC features are combined, out of all the specified systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data Availability

The dataset produced and scrutinized in this study are accessible from the corresponding author upon reasonable request.

References

  1. Ali H, Hariharan M, Yaacob S, Adom AH. Facial emotion recognition using empirical mode decomposition. Expert Syst Appl. 2015;42(3):1261–77.

    Article  Google Scholar 

  2. Liu ZT, Wu M, Cao WH, MaoJW XuJP, Tan GZ. Speech emotion recognition based on feature selection and extreme learning machine decision tree. Neurocomputing. 2018;273:271–80.

    Article  Google Scholar 

  3. Ragot M, Martin N, Em S, Pallamin N, Diverrez JM. Emotion recognition using physiological signals: Laboratory vs. wearable sensors. In: International conference on applied human factors and ergonomics. Springer; 2017. pp. 15–22

  4. Surabhi V, Saurabh M. Speech emotion recognition: a review. Int Res J Eng Technol (IRJET). 2016;03:313–6.

    Google Scholar 

  5. Wu S, Falk TH, Chan WY. Automatic speech emotion recognition using modulation spectral features. Speech Commun. 2011;53:768–85.

    Article  Google Scholar 

  6. Wu S. Recognition of human emotion in speech using modulation spectral features and support vector machines [PhD thesis]. 2009

  7. Tang J, Alelyani S, Liu H. Feature selection for classification: a review. Data Classification: Algorithms and Applications. 2014:37

  8. Martin V, Robert V. Recognition of emotions in German speech using Gaussian mixture models. LNAI. 2009;5398:256–63.

    Google Scholar 

  9. Milton A, Sharmy Roy S, Tamil Selvi S. SVM scheme for speech emotion recognition using MFCC feature. Int J Comput Appl. 2013;69(9).

    Google Scholar 

  10. Milton A, Sharmy Roy S, Tamil Selvi S. SVM scheme for speech emotion recognition using MFCC feature. Int J Comput Appl. 2013;69:34–9.

    Google Scholar 

  11. Sree GD, Chandrasekhar P, Venkatesshulu B. SVM based speech emotion recognition compared with GMM-UBM and NN. Int J Eng Sci. 2016;3293

    Google Scholar 

  12. Melki G, Kecman V, Ventura S, Cano A. OLLAWV: online learning algorithm using worst-violators. Appl Soft Comput. 2018;66:384–93.

    Article  Google Scholar 

  13. Pan Y, Shen P, Shen L. Speech emotion recognition using support vector machine. Int J Smart Home. 2012;6:101–8.

    Google Scholar 

  14. Peipei S, Zhou C, Xiong C. Automatic speech emotion recognition using support vector machine. IEEE. 2011;2:621–5.

    Google Scholar 

  15. Sathit P. Improvement of speech emotion recognition with neural network classifier by using speech spectrogram. In: International conference on systems, signals and image processing (IWSSIP), 2015. pp. 73–76

  16. Alex G, Navdeep J. Towards end-to end speech recognition with recurrent neural networks. In: International conference on machine learning, vol. 32. 2014.

  17. Chen S, Jin Q. Multi-modal dimensional emotion recognition using recurrent neural networks. Australia: Brisbane; 2015

  18. Lim W, Jang D, Lee T. Speech emotion recognition using convolutional and recurrent neural networks. Asia Pacific. 2017. pp. 1–4

  19. Sara M, Saeed S, Rabiee A. Speech emotion recognition based on a modified brain emotional learning model. Biol Inspir Cogn Archit. 2017;19:32–8.

    Google Scholar 

  20. Yu G, Eric P, Hai-Xiang L, van den Herik J. Speech emotion recognition using voiced segment selection algorithm. ECAI. 2016;285:1682–3.

    Google Scholar 

  21. Brown JC. Calculation of a constant Q spectral transform. J Acoust Soc Am. 1991;89(1):425–34.

    Article  Google Scholar 

  22. Makhoul J. Linear prediction: a tutorial review. Proc IEEE. 1975;63(4):561–80.

    Article  Google Scholar 

  23. Prasanna SRM, Gupta CS, Yegnanarayana B. "Extraction of speaker-specific excitation information from linear prediction residual of speech. Speech Commun. 2006;48(10):1243–61.

    Article  Google Scholar 

  24. Phapatanaburi K, et al. Replay attack detection using linear prediction analysis-based relative phase features. IEEE Access. 2019;7:183614–25.

    Article  Google Scholar 

  25. Wang L, et al. Relative phase information for detecting human speech and spoofed speech. In: Sixteenth annual conference of the international speech communication association, 2015.

  26. Wang L, et al. Spoofing speech detection using modified relative phase information. IEEE J Select Top Signal Process. 2017;11(4):660–70.

    Article  Google Scholar 

  27. Sahidullah M, Kinnunen T, Hanilçi C. A comparison of features for synthetic speech detection. 2015.

  28. Li D, et al. Multiple phase information combination for replay attacks detection. In: INTERSPEECH, 2018.

  29. Livingstone SR, Russo FA. The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): a dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 2018;13(5):e0196391. https://doi.org/10.1371/journal.pone.0196391

    Article  Google Scholar 

  30. Mohan M, Dhanalakshmi P, Satheesh Kumar R. Speech emotion classification using ensemble models with MFCC. Procedia Comput Sci. 2023;218:1857–68.

    Article  Google Scholar 

  31. Singh J, Saheer LB, Faust O. Speech emotion recognition using attention model. Int J Environ Res Public Health. 2023;20(6):5140.

    Article  Google Scholar 

  32. Aggarwal A, et al. Two-way feature extraction for speech emotion recognition using deep learning. Sensors. 2022;22(6):2378.

    Article  Google Scholar 

Download references

Acknowledgements

The authors acknowledged the REVA University, Bangalore, Karnataka, India for supporting the research work by providing the facilities.

Funding

No funding received for this research.

Author information

Authors and Affiliations

Authors

Contributions

This research was a collective effort, made possible through the collaboration and contributions of all authors involved.

Corresponding author

Correspondence to D. Ravi Shankar.

Ethics declarations

Conflict of interest

No conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Advances in Computational Approaches for Image Processing, Wireless Networks, Cloud Applications and Network Security” guest edited by P. Raviraj, Maode Ma and Roopashree H R.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shankar, D.R., Manjula, R.B. & Biradar, R.C. Speech Emotion Recognition Using Magnitude and Phase Features. SN COMPUT. SCI. 5, 532 (2024). https://doi.org/10.1007/s42979-024-02833-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-024-02833-1

Keywords

Navigation