Skip to main content
Log in

Efficient feature combination techniques for emotional speech classification

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

The challenge to enhance the naturalness and efficiency of spoken language man–machine interface, emotional speech identification and its classification has been a predominant research area. The reliability and accuracy of such emotion identification greatly depends on the feature selection and extraction. In this paper, a combined feature selection technique has been proposed which uses the reduced features set artifact of vector quantizer (VQ) in a Radial Basis Function Neural Network (RBFNN) environment for classification. In the initial stage, Linear Prediction Coefficient (LPC) and time–frequency Hurst parameter (pH) are utilized to extract the relevant feature, both exhibiting complementary information from the emotional speech. Extensive simulations have been carried out using Berlin Database of Emotional Speech (EMO-DB) with various combination of feature set. The experimental results reveal 76 % accuracy for pH and 68 % for LPC using standalone feature set, whereas the combination of feature sets, (LP VQC and pH VQC) enhance the average accuracy level up to 90.55 %.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Abry, P., & Veith, D. (1998). Wavelet analysis of long-range-dependent traffic. IEEE Transactions on Information Theory, 44(1), 2–15.

    Article  MATH  MathSciNet  Google Scholar 

  • Ayadi, M. E., Kamel, M. S., & Karray, F. (2011). Survey on speech recognition: Resources, features and methods. Pattern Recognition, 44, 572–587.

    Article  MATH  Google Scholar 

  • Beran, J. (1994). Statistics for long-memory processes. New York: Chapman and Hall/CRC Press.

    MATH  Google Scholar 

  • Bishop, C. M. (1995). Neural networks for pattern recognition (1st ed.). Oxford: Oxford University Press.

    MATH  Google Scholar 

  • Biswas, A., Sahu, P. K., Bhowmick, A., & Chandra, M. (2014). Feature extraction technique using ERB like wavelet sub-band periodic and aperiodic decomposition for TIMIT phoneme recognition. International Journal of Speech Technology, Springer, 17(4), 389–399.

    Article  Google Scholar 

  • Burkhardt, F., Paeschke, A. and Rolfes, M., Sendlmeier, W.F. and Weiss, B. (2005). A database of German emotional speech, Proceedings of Interspeech 2005.

  • Chauhan, H. B., & Tanawala, B. A. (2015). Comparative study of MFCC and LPC algorithms for Gujrati isolated word recognition. International Journal of Innovative Research in Computer and Communication Engineering, 3(2), 822–826.

    Google Scholar 

  • Do, V. H., Xiao, X. and Chng, E.S. (2011). Comparison and combination of multilayer perceptrons and deep belief networks in hybrid automatic speech recognition systems. Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Xi’an, China.

  • Droua-Hamdani, G., Selouani, S. A., & Boudraa, M. (2012). Speaker-independent ASR for modern standard Arabic: Effect of regional accents. International Journal of Speech Technology, Springer, 15(4), 487–493.

    Article  Google Scholar 

  • Fulmare, N. S., Chakrabarti, P., & Yada, D. (2013). Understanding and estimation of emotional expression using acoustic analysis of natural speech. International Journal on Natural Language Computing, 2(4), 37–46.

    Article  Google Scholar 

  • Gevaert, W., Tsenov, G., & Mladenov, V. (2010). Neural networks used for speech recognition. Journal of Automatic Control, university of Belgrade, 20, 1–7.

    Article  Google Scholar 

  • Haykins, S. (2006). Neural networks: A comprehensive foundation (2nd ed.). Delhi: Pearson Education.

    Google Scholar 

  • Ishizuka, K., & Nakatani, T. (2006). A feature extraction method using sub-band based periodicity and aperiodicity decomposition with noise robust frontend processing for automatic speech recognition. Speech Communication, 48(11), 1447–1457.

    Article  Google Scholar 

  • Kopparapu, S. K. and Laxminarayana, M. (2010). Choice of mel filter bank in computing MFCC of a resampled speech. 10th International Conference on Information Science, Signal Processing and their Applications (ISSPA 2010), IEEE, pp. 121–124.

  • Kulikovs, M., Sharkovsky, S., & Petersons, E. (2010). Comparative studies of methods for accurate Hurst parameter estimation. Electronics and Electrical engineering, Elektronika IR Elektrotechnika, 7(103), 113–116.

    Google Scholar 

  • Li, Y. and Zhao, Y. (1998). Recognizing emotions in speech using short-term and long-term features. ICSLP.

  • Linde, Y., Buzo, A., & Gray, R. (1980). An algorithm for vector quantizer design. IEEE Transactions on Communications, 28, 84–95.

    Article  Google Scholar 

  • Liu, Q., Yao, M., Xu, H., & Wang, F. (2013). Research on different feature parameters in speaker recognition. Journal of Signal and Information Processing, Scientific Research, 4, 106–110.

    Article  Google Scholar 

  • Luengo, I., Navas, E., & Hernáez, I. (2010). Feature analysis and evaluation for automatic emotion identification in speech. IEEE Transaction Multimedia, 12(6), 490–501.

    Article  Google Scholar 

  • Messina, E., Arosio, G., & Archetti, F. (2009). Audio-based emotion recognition in judicial domain: A multilayer support vector machines approach. Machine Learning and Data Mining in Pattern Recognition, 5632, 594–602.

    Article  Google Scholar 

  • Palo, H. K., Mohanty, M. N. and Chandra, M. (2015a). Emotion recognition using MLP and GMM for Oriya language. International Journal of Computational Vision and Robotics, Inderscience.

  • Palo, H. K., Mohanty, M. N. and Chandra, M. (2015b). Design of neural network model for emotional speech recognition. Artificial Intelligence and Evolutionary Algorithms in Engineering Systems, 325, Springer India, pp. 291–300.

  • Pao, T.-L., Chen, Y.-T., Yenand, J.-H., & Liao, W.-Y. (2005). Detecting emotions in Mandarin speech. Computational Linguistics and Chinese Language Processing, 10(3), 347–362.

    Google Scholar 

  • Roughan, M. and Veitch, D. (1999). Measuring long-range dependence under changing traffic conditions. IEEE, pp. 1513–1521.

  • Roughan, M., Veith, D., & Abry, P. (2000). Real-time estimation of the parameters of long-range dependence. IEEE/ACM Transaction Network, 8(4), 467–478.

    Article  Google Scholar 

  • Sant’Ana, R., Coelho, R., & Alcaim, A. (2006). Text-independent speaker recognition based on the Hurst parameter and the multidimensional fractional brownian motion model. IEEE Transactions on Audio, Speech and Language Processing, 14(3), 931–940.

    Article  Google Scholar 

  • Santos, R. B., Rupp, M., Bonzi, S. J., & Fileti, A. M. F. (2013). Comparison between multilayer feed forward neural networks and a radial basis function network to detect and locate leaks in pipelines transporting gas. Chemical Engineering Transactions, 32, 1375–1380.

    Google Scholar 

  • Scherer, K. (2003). Vocal communication of emotion: A review of research paradigms. Speech Communication, 40(1–2), 227–256.

    Article  MATH  Google Scholar 

  • Schuller, B., Batliner, A., Steidl, S., & Seppi, D. (2011). Recognizing realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge. Speech Communication, 53, 1062–1087.

    Article  Google Scholar 

  • Schuller, B., Müller, R., Lang, M., & Rigoll, G. (2005). Speaker independent emotion recognition by early Fusion of acoustic and linguistic features within ensembles (pp. 805–808). Lisbon: Interspeech.

    Google Scholar 

  • Sharma, R. P., Farooq, O., & Khan, I. (2012). Wavelet based sub-band parameters for classification of unaspirated Hindi stop consonants in initial position of CV syllables. International Journal of Speech Technology, Springer, 16(3), 323–332.

    Article  Google Scholar 

  • Spanias, A. S. (1994). Speech coding: A tutorial review. Proceedings IEEE, 82, 1541–1582.

    Article  Google Scholar 

  • Sunny, S., Peter, S. D. and Jacob, K. P. (2013). Combined feature extraction techniques and naïve bayes classifier for speech recognition. VLSI, CS & IT-CSCP, pp. 155–163.

  • Veitch, D., & Abry, P. (1999). A wavelet-based joint estimator of the parameters of long-range dependence. IEEE Transactions on Information Theory, 45(3), 878–897.

    Article  MathSciNet  MATH  Google Scholar 

  • Wang, K., An, N., Li, B.N. & Zhang, Y. (2015) Speech emotion recognition using Fourier parameters. IEEE Transactions on Affective Computing, 6(1), 69–75.

    Article  Google Scholar 

  • Wenjing, H., Haifeng, L. and Chunyu, G. (2009). A hybrid speech emtion perception method of VQ-based feature processing and ANN recognition. Global Congress on Intelligent Systems, IEEE computer society, Xiamen, pp. 145–149.

  • Wu, C. H., & Liang, W. B. (2011). Emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information and semantic labels. IEEE Transaction on Affective Computing, 2(1), 10–21.

    Article  MathSciNet  Google Scholar 

  • Yu, H., Xie, T., Paszezynski, S., & Wilamowski, B. M. (2011). Advantages of radial basis function networks for dynamics system design. IEEE Transactions on Industrial Electronics, 58(12), 5438–5450.

    Article  Google Scholar 

  • Zao, L., Cavalcante, D., & Coelho, R. (2014). Time-frequency feature and AMS-GMM mask for acoustic emotion classification. IEEE Signal Processing Letters, 21(5), 620–624.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mihir Narayan Mohanty.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Palo, H.K., Mohanty, M.N. & Chandra, M. Efficient feature combination techniques for emotional speech classification. Int J Speech Technol 19, 135–150 (2016). https://doi.org/10.1007/s10772-016-9333-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-016-9333-9

Keywords

Navigation