Abstract
In this work, a novel method for system fusion in emotion recognition for speech is presented. The proposed approach, namely Anchor Model Fusion (AMF), exploits the characteristic behaviour of the scores of a speech utterance among different emotion models, by a mapping to a back-end anchor-model feature space followed by a SVM classifier. Experiments are presented in three different databases: Ahumada III, with speech obtained from real forensic cases; and SUSAS Actual and SUSAS Simulated. Results comparing AMF with a simple sum-fusion scheme after normalization show a significant performance improvement of the proposed technique for two of the three experimental set-ups, without degrading performance in the third one.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ververidisa, D., Kotropoulos, C.: Emotional speech recognition: Resources, features, and methods. Speech Communication (9), 1162–1181 (2006)
Picard, R.W.: Affective Computing. MIT Press, Cambridge (1997)
Ramabadran, T., Meunier, J., Jasiuk, M., Kushner, B.: Enhancing distributed speech recognition with back-end speech reconstruction. In: Proceedings of Eurospeech 2001, pp. 1859–1862 (2001)
Collet, M., Mami, Y., Charlet, D., Bimbot, F.: Probabilistic anchor models approach for speaker verification, 2005–2008 (2005)
Ramos, D., Gonzalez-Rodriguez, J., Gonzalez-Dominguez, J., Lucena-Molina, J.J.: Addressing database mismatch in forensic speaker recognition with ahumada iii: a public real-case database in spanish. In: Proceedings of Interspeech 2008, September 2008, pp. 1493–1496 (2008)
Hansen, J., Sahar, E.: Getting started with susas: a speech under simulated and actual stress database. In: Proceedings of Eurospeech 1997, pp. 1743–1746 (1997)
Lopez-Moreno, I., Ramos, D., Gonzalez-Rodriguez, J., Toledano, D.T.: Anchor-model fusion for language recognition. In: Proceedings of Interspeech 2008 (September 2008)
Hansen, J., Patil, S.: Speech under stress: Analysis, modeling and recognition. In: Müller, C. (ed.) Speaker Classification 2007. LNCS (LNAI), vol. 4343, pp. 108–137. Springer, Heidelberg (2007)
Boersma, P., Weenink, D.: Praat: doing phonetics by computer (version 5.1.04) [computer program] (April 2009), http://www.praat.org/
Hu, H., Xu, M.X., Wu, W.: Gmm supervector based svm with spectral features for speech emotion recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing. ICASSP 2007, vol. 4, pp. IV-413–IV-416 (2007)
Kwon, O.W., Chan, K., Hao, J., Lee, T.W.: Emotion recognition by speech signals. In: EUROSPEECH 2003, pp. 125–128 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ortego-Resa, C., Lopez-Moreno, I., Ramos, D., Gonzalez-Rodriguez, J. (2009). Anchor Model Fusion for Emotion Recognition in Speech. In: Fierrez, J., Ortega-Garcia, J., Esposito, A., Drygajlo, A., Faundez-Zanuy, M. (eds) Biometric ID Management and Multimodal Communication. BioID 2009. Lecture Notes in Computer Science, vol 5707. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04391-8_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-04391-8_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04390-1
Online ISBN: 978-3-642-04391-8
eBook Packages: Computer ScienceComputer Science (R0)