PCA-Based Random Forest Classifier for Speech Emotion Recognition Using FFTF Features, Jitter, and Shimmer

Patil, Sheetal; Kharate, G. K.

doi:10.1007/978-981-19-1677-9_17

Sheetal Patil⁴⁰ &
G. K. Kharate⁴⁰

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 894))

Included in the following conference series:

International Conference on Electrical and Electronics Engineering

725 Accesses
1 Citations

Abstract

In recent years, it has been observed that emotion recognition is a rapidly growing research area. Unlike humans, machines do not possess aptitudes to recognize emotions. This study aims to recognize five emotions, i.e., fear, anger, happiness, neutral, and sadness, from a speech signal. EmoDB dataset comprises the above mentioned emotions; so, this German dataset is used. Prosodic features like energy, voiced and unvoiced frame duration, and silence are extracted. When an individual expresses his/her emotions strongly, his/her voice amplitude and frequency fluctuations turn out to be high. Such fluctuations can detect an emotion, and it is possible to measure jitter and shimmer prosodic features that are extracted for a speech signal’s voiced frames. Spectrum tilt and zero-crossing rate are obtained, and frames of a speech segment are classified as voiced, unvoiced, and silence using hamming windows. Each frame comprises 960 speech samples. Moreover, in our study, a new feature, i.e., Fast Fourier transform (FFT), is extracted. Instead of using FFT of 960 samples, only mean, minimum, maximum, variance, and standard deviation for each FFT are used. We name these features as FFTF. Thirteen Mel frequency cepstral coefficients features are also extracted. A random forest classifier is used; in the first experiment, the effect of different number of trees is observed. With 100 trees, an emotion recognition accuracy of 78.6885% is attained. In the second experiment, different features of voiced and unvoiced frames are used, and it is observed that for FFT features, an accuracy of 78.6885% is attained. In the third experiment, the number of features are reduced using PCA, and it is observed that the accuracy is improved to 83.60%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 229.00; Price excludes VAT (USA)

Softcover Book: USD 299.99; Price excludes VAT (USA)

Hardcover Book: USD 299.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Vocal-based emotion recognition using random forests and decision tree

Article 09 February 2017

Speech emotion recognition using multimodal feature fusion with machine learning approach

Article 21 April 2023

Speech Emotion Recognition Using Machine Learning: A Comparative Analysis

Article 04 April 2024

References

Lisetti, C.L.: Affective computing. Pattern Anal. Appl. 1(1), 71–73 (1998). https://doi.org/10.1007/bf01238028
Article Google Scholar
Wang, K., An, N., Li, B.N., Zhang, Y., Li, L.: Speech emotion recognition using Fourier parameters. IEEE Trans. Affect. Comput. 6(1), 69–75 (2015). https://doi.org/10.1109/TAFFC.2015.2392101
Article Google Scholar
Iqbal, A., Barua, K.: A real-time emotion recognition from speech using gradient boosting. In: 2nd International Conference on Electrical, Computer and Communication Engineering (ECCE), pp. 1–5, (2019). https://doi.org/10.1109/ECACE.2019.8679271
Ververidis, D., Kotropoulos, C.: Emotional speech recognition: Resources, features, and methods. Speech Commun. 48(9), 1162–1181 (2006). https://doi.org/10.1016/j.specom.2006.04.003
Article Google Scholar
Ke, X., Cao, B., Bai, J., Yu, Q., Yang, D.: Speech emotion recognition based on PCA and CHMM. In: Proceedings of 2019 IEEE 8th Joint International Information Technology and Artificial Intelligence Conference, ITAIC 2019, pp. 667–671 (2019). https://doi.org/10.1109/ITAIC.2019.8785867
El Ayadi, M., Kamel, M.S., Karray, F.: Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognit. 44(3), 572–587 (2011). https://doi.org/10.1016/j.patcog.2010.09.020
Article MATH Google Scholar
Akçay, M.B., Oğuz, K.: Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers. Speech Commun. 116, 56–76 (2020). https://doi.org/10.1016/j.specom.2019.12.001
Xi, L., et al.: Stress and emotion classification using jitter and shimmer features. In: Proceedings on ICASSP, IEEE International Conference on Acoustics, Speech, & Signal Processing, vol. 4, May 2007. https://doi.org/10.1109/ICASSP.2007.367261
Küstner, D., Tato, R., Kemp, T., Meffert, B.: Towards real life applications in emotion recognition. In: André, E., Dybkjær, L., Minker, W., Heisterkamp, P. (eds.) ADS 2004. LNCS (LNAI), vol. 3068, pp. 25–35. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24842-2_3
Chapter Google Scholar
Busso, C., Mariooryad, S., Metallinou, A., Narayanan, S.: Iterative feature normalization scheme for automatic emotion detection from speech. IEEE Trans. Affect. Comput. 4(4), 386–397 (2013). https://doi.org/10.1109/T-AFFC.2013.26
Article Google Scholar
Mukhopadhyay, M., et al.: Facial emotion recognition based on textural pattern and convolutional neural network. In: 2021 IEEE 4th International Conference on Computing, Power and Communication Technologies (GUCON), pp. 1–6 (2021). https://doi.org/10.1109/GUCON50781.2021.9573860
Lee, C.M., Narayanan, S.S., Pieraccini, R.: Classifying emotions in human-machine spoken dialogs. In: Proceedings 2002 IEEE International Conference on Multimedia and Expo, ICME 2002, vol. 1, pp. 737–740 (2002). https://doi.org/10.1109/ICME.2002.1035887
Emotion Recognition From Noisy Speech College of Computer Science, YuQuan Campus, ZheJiang University, Hangzhou, CHINA, 310027 National Laboratory of Pattern Recognition, Chinese Academy of Sciences, Beijing, China, vol. 100080, pp. 1653–1656 (2006)
Google Scholar
You, M., Chen, C., Bu, J., Liu, J., Tao, J.: A hierarchical framework for speech emotion recognition. IEEE Int. Symp. Ind. Electron. 1(2), 515–519 (2006). https://doi.org/10.1109/ISIE.2006.295649
Article Google Scholar
Rawat, R., Mahor, V., Chirgaiya, S., Shaw, R.N., Ghosh, A.: Sentiment analysis at online social network for cyber-malicious post reviews using machine learning techniques. In: Bansal, J.C., Paprzycki, M., Bianchini, M., Das, S. (eds.) Computationally Intelligent Systems and their Applications. SCI, vol. 950, pp. 113–130. Springer, Singapore (2021). https://doi.org/10.1007/978-981-16-0407-2_9
Chapter Google Scholar
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., Weiss, B.: A database of German emotional speech. In: 9th European Conference on Speech Communication and Technology, January, pp. 1517–1520 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Matoshri College of Engineering and Research Centre, Nashik, India
Sheetal Patil & G. K. Kharate

Authors

Sheetal Patil
View author publications
You can also search for this author in PubMed Google Scholar
G. K. Kharate
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sheetal Patil .

Editor information

Editors and Affiliations

School of Science, Computing and Engineering Technologies, Swinburne University of Technology, Hawthorn, VIC, Australia
Saad Mekhilef
Office of the International Relations, Bharath Institute of Higher Education and Research, Chennai, India
Rabindra Nath Shaw
Department of Management and Innovation Systems, University of Salerno, Fisciano, Salerno, Italy
Pierluigi Siano

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Patil, S., Kharate, G.K. (2022). PCA-Based Random Forest Classifier for Speech Emotion Recognition Using FFTF Features, Jitter, and Shimmer. In: Mekhilef, S., Shaw, R.N., Siano, P. (eds) Innovations in Electrical and Electronic Engineering. ICEEE 2022. Lecture Notes in Electrical Engineering, vol 894. Springer, Singapore. https://doi.org/10.1007/978-981-19-1677-9_17

Download citation

DOI: https://doi.org/10.1007/978-981-19-1677-9_17
Published: 14 April 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-1676-2
Online ISBN: 978-981-19-1677-9
eBook Packages: EnergyEnergy (R0)

Publish with us

Policies and ethics

PCA-Based Random Forest Classifier for Speech Emotion Recognition Using FFTF Features, Jitter, and Shimmer

Abstract

Access this chapter

Similar content being viewed by others

Vocal-based emotion recognition using random forests and decision tree

Speech emotion recognition using multimodal feature fusion with machine learning approach

Speech Emotion Recognition Using Machine Learning: A Comparative Analysis

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

PCA-Based Random Forest Classifier for Speech Emotion Recognition Using FFTF Features, Jitter, and Shimmer

Abstract

Access this chapter

Similar content being viewed by others

Vocal-based emotion recognition using random forests and decision tree

Speech emotion recognition using multimodal feature fusion with machine learning approach

Speech Emotion Recognition Using Machine Learning: A Comparative Analysis

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation