Efficient feature combination techniques for emotional speech classification

Palo, Hemanta Kumar; Mohanty, Mihir Narayan; Chandra, Mahesh

doi:10.1007/s10772-016-9333-9

Efficient feature combination techniques for emotional speech classification

Published: 20 January 2016

Volume 19, pages 135–150, (2016)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Hemanta Kumar Palo¹,
Mihir Narayan Mohanty¹ &
Mahesh Chandra²

591 Accesses
25 Citations
1 Altmetric
Explore all metrics

Abstract

The challenge to enhance the naturalness and efficiency of spoken language man–machine interface, emotional speech identification and its classification has been a predominant research area. The reliability and accuracy of such emotion identification greatly depends on the feature selection and extraction. In this paper, a combined feature selection technique has been proposed which uses the reduced features set artifact of vector quantizer (VQ) in a Radial Basis Function Neural Network (RBFNN) environment for classification. In the initial stage, Linear Prediction Coefficient (LPC) and time–frequency Hurst parameter (pH) are utilized to extract the relevant feature, both exhibiting complementary information from the emotional speech. Extensive simulations have been carried out using Berlin Database of Emotional Speech (EMO-DB) with various combination of feature set. The experimental results reveal 76 % accuracy for pH and 68 % for LPC using standalone feature set, whereas the combination of feature sets, (LP VQC and pH VQC) enhance the average accuracy level up to 90.55 %.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comparison of Classifiers for Speech Emotion Recognition (SER) with Discriminative Spectral Features

Automatic speech emotion recognition based on hybrid features with ANN, LDA and K_NN classifiers

Article 22 April 2023

Improving Speech Emotion Recognition System Using Spectral and Prosodic Features

References

Abry, P., & Veith, D. (1998). Wavelet analysis of long-range-dependent traffic. IEEE Transactions on Information Theory, 44(1), 2–15.
Article MATH MathSciNet Google Scholar
Ayadi, M. E., Kamel, M. S., & Karray, F. (2011). Survey on speech recognition: Resources, features and methods. Pattern Recognition, 44, 572–587.
Article MATH Google Scholar
Beran, J. (1994). Statistics for long-memory processes. New York: Chapman and Hall/CRC Press.
MATH Google Scholar
Bishop, C. M. (1995). Neural networks for pattern recognition (1st ed.). Oxford: Oxford University Press.
MATH Google Scholar
Biswas, A., Sahu, P. K., Bhowmick, A., & Chandra, M. (2014). Feature extraction technique using ERB like wavelet sub-band periodic and aperiodic decomposition for TIMIT phoneme recognition. International Journal of Speech Technology, Springer, 17(4), 389–399.
Article Google Scholar
Burkhardt, F., Paeschke, A. and Rolfes, M., Sendlmeier, W.F. and Weiss, B. (2005). A database of German emotional speech, Proceedings of Interspeech 2005.
Chauhan, H. B., & Tanawala, B. A. (2015). Comparative study of MFCC and LPC algorithms for Gujrati isolated word recognition. International Journal of Innovative Research in Computer and Communication Engineering, 3(2), 822–826.
Google Scholar
Do, V. H., Xiao, X. and Chng, E.S. (2011). Comparison and combination of multilayer perceptrons and deep belief networks in hybrid automatic speech recognition systems. Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Xi’an, China.
Droua-Hamdani, G., Selouani, S. A., & Boudraa, M. (2012). Speaker-independent ASR for modern standard Arabic: Effect of regional accents. International Journal of Speech Technology, Springer, 15(4), 487–493.
Article Google Scholar
Fulmare, N. S., Chakrabarti, P., & Yada, D. (2013). Understanding and estimation of emotional expression using acoustic analysis of natural speech. International Journal on Natural Language Computing, 2(4), 37–46.
Article Google Scholar
Gevaert, W., Tsenov, G., & Mladenov, V. (2010). Neural networks used for speech recognition. Journal of Automatic Control, university of Belgrade, 20, 1–7.
Article Google Scholar
Haykins, S. (2006). Neural networks: A comprehensive foundation (2nd ed.). Delhi: Pearson Education.
Google Scholar
Ishizuka, K., & Nakatani, T. (2006). A feature extraction method using sub-band based periodicity and aperiodicity decomposition with noise robust frontend processing for automatic speech recognition. Speech Communication, 48(11), 1447–1457.
Article Google Scholar
Kopparapu, S. K. and Laxminarayana, M. (2010). Choice of mel filter bank in computing MFCC of a resampled speech. 10th International Conference on Information Science, Signal Processing and their Applications (ISSPA 2010), IEEE, pp. 121–124.
Kulikovs, M., Sharkovsky, S., & Petersons, E. (2010). Comparative studies of methods for accurate Hurst parameter estimation. Electronics and Electrical engineering, Elektronika IR Elektrotechnika, 7(103), 113–116.
Google Scholar
Li, Y. and Zhao, Y. (1998). Recognizing emotions in speech using short-term and long-term features. ICSLP.
Linde, Y., Buzo, A., & Gray, R. (1980). An algorithm for vector quantizer design. IEEE Transactions on Communications, 28, 84–95.
Article Google Scholar
Liu, Q., Yao, M., Xu, H., & Wang, F. (2013). Research on different feature parameters in speaker recognition. Journal of Signal and Information Processing, Scientific Research, 4, 106–110.
Article Google Scholar
Luengo, I., Navas, E., & Hernáez, I. (2010). Feature analysis and evaluation for automatic emotion identification in speech. IEEE Transaction Multimedia, 12(6), 490–501.
Article Google Scholar
Messina, E., Arosio, G., & Archetti, F. (2009). Audio-based emotion recognition in judicial domain: A multilayer support vector machines approach. Machine Learning and Data Mining in Pattern Recognition, 5632, 594–602.
Article Google Scholar
Palo, H. K., Mohanty, M. N. and Chandra, M. (2015a). Emotion recognition using MLP and GMM for Oriya language. International Journal of Computational Vision and Robotics, Inderscience.
Palo, H. K., Mohanty, M. N. and Chandra, M. (2015b). Design of neural network model for emotional speech recognition. Artificial Intelligence and Evolutionary Algorithms in Engineering Systems, 325, Springer India, pp. 291–300.
Pao, T.-L., Chen, Y.-T., Yenand, J.-H., & Liao, W.-Y. (2005). Detecting emotions in Mandarin speech. Computational Linguistics and Chinese Language Processing, 10(3), 347–362.
Google Scholar
Roughan, M. and Veitch, D. (1999). Measuring long-range dependence under changing traffic conditions. IEEE, pp. 1513–1521.
Roughan, M., Veith, D., & Abry, P. (2000). Real-time estimation of the parameters of long-range dependence. IEEE/ACM Transaction Network, 8(4), 467–478.
Article Google Scholar
Sant’Ana, R., Coelho, R., & Alcaim, A. (2006). Text-independent speaker recognition based on the Hurst parameter and the multidimensional fractional brownian motion model. IEEE Transactions on Audio, Speech and Language Processing, 14(3), 931–940.
Article Google Scholar
Santos, R. B., Rupp, M., Bonzi, S. J., & Fileti, A. M. F. (2013). Comparison between multilayer feed forward neural networks and a radial basis function network to detect and locate leaks in pipelines transporting gas. Chemical Engineering Transactions, 32, 1375–1380.
Google Scholar
Scherer, K. (2003). Vocal communication of emotion: A review of research paradigms. Speech Communication, 40(1–2), 227–256.
Article MATH Google Scholar
Schuller, B., Batliner, A., Steidl, S., & Seppi, D. (2011). Recognizing realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge. Speech Communication, 53, 1062–1087.
Article Google Scholar
Schuller, B., Müller, R., Lang, M., & Rigoll, G. (2005). Speaker independent emotion recognition by early Fusion of acoustic and linguistic features within ensembles (pp. 805–808). Lisbon: Interspeech.
Google Scholar
Sharma, R. P., Farooq, O., & Khan, I. (2012). Wavelet based sub-band parameters for classification of unaspirated Hindi stop consonants in initial position of CV syllables. International Journal of Speech Technology, Springer, 16(3), 323–332.
Article Google Scholar
Spanias, A. S. (1994). Speech coding: A tutorial review. Proceedings IEEE, 82, 1541–1582.
Article Google Scholar
Sunny, S., Peter, S. D. and Jacob, K. P. (2013). Combined feature extraction techniques and naïve bayes classifier for speech recognition. VLSI, CS & IT-CSCP, pp. 155–163.
Veitch, D., & Abry, P. (1999). A wavelet-based joint estimator of the parameters of long-range dependence. IEEE Transactions on Information Theory, 45(3), 878–897.
Article MathSciNet MATH Google Scholar
Wang, K., An, N., Li, B.N. & Zhang, Y. (2015) Speech emotion recognition using Fourier parameters. IEEE Transactions on Affective Computing, 6(1), 69–75.
Article Google Scholar
Wenjing, H., Haifeng, L. and Chunyu, G. (2009). A hybrid speech emtion perception method of VQ-based feature processing and ANN recognition. Global Congress on Intelligent Systems, IEEE computer society, Xiamen, pp. 145–149.
Wu, C. H., & Liang, W. B. (2011). Emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information and semantic labels. IEEE Transaction on Affective Computing, 2(1), 10–21.
Article MathSciNet Google Scholar
Yu, H., Xie, T., Paszezynski, S., & Wilamowski, B. M. (2011). Advantages of radial basis function networks for dynamics system design. IEEE Transactions on Industrial Electronics, 58(12), 5438–5450.
Article Google Scholar
Zao, L., Cavalcante, D., & Coelho, R. (2014). Time-frequency feature and AMS-GMM mask for acoustic emotion classification. IEEE Signal Processing Letters, 21(5), 620–624.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronics and Communication Engineering, ITER, Siksha ‘O’ Anusandhan University, Bhubaneswar, Odisha, India
Hemanta Kumar Palo & Mihir Narayan Mohanty
Department of Electronics and Communication Engineering, Birla Institute Technology, Ranchi, India
Mahesh Chandra

Authors

Hemanta Kumar Palo
View author publications
You can also search for this author in PubMed Google Scholar
Mihir Narayan Mohanty
View author publications
You can also search for this author in PubMed Google Scholar
Mahesh Chandra
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mihir Narayan Mohanty.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Palo, H.K., Mohanty, M.N. & Chandra, M. Efficient feature combination techniques for emotional speech classification. Int J Speech Technol 19, 135–150 (2016). https://doi.org/10.1007/s10772-016-9333-9

Download citation

Received: 01 September 2015
Accepted: 13 January 2016
Published: 20 January 2016
Issue Date: March 2016
DOI: https://doi.org/10.1007/s10772-016-9333-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient feature combination techniques for emotional speech classification

Abstract

Access this article

Similar content being viewed by others

Comparison of Classifiers for Speech Emotion Recognition (SER) with Discriminative Spectral Features

Automatic speech emotion recognition based on hybrid features with ANN, LDA and K_NN classifiers

Improving Speech Emotion Recognition System Using Spectral and Prosodic Features

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficient feature combination techniques for emotional speech classification

Abstract

Access this article

Similar content being viewed by others

Comparison of Classifiers for Speech Emotion Recognition (SER) with Discriminative Spectral Features

Automatic speech emotion recognition based on hybrid features with ANN, LDA and K_NN classifiers

Improving Speech Emotion Recognition System Using Spectral and Prosodic Features

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation