Investigation of Speaker Group-Dependent Modelling for Recognition of Affective States from Speech

Siegert, Ingo; Philippou-Hübner, David; Hartmann, Kim; Böck, Ronald; Wendemuth, Andreas

doi:10.1007/s12559-014-9296-6

Investigation of Speaker Group-Dependent Modelling for Recognition of Affective States from Speech

Published: 23 August 2014

Volume 6, pages 892–913, (2014)
Cite this article

Cognitive Computation Aims and scope Submit manuscript

Ingo Siegert¹,
David Philippou-Hübner¹,
Kim Hartmann¹,
Ronald Böck¹ &
…
Andreas Wendemuth¹

305 Accesses
18 Citations
1 Altmetric
Explore all metrics

Abstract

For successful human–machine-interaction (HCI) the pure textual information and the individual skills, preferences, and affective states of the user must be known. Therefore, as a starting point, the user’s actual affective state has to be recognized. In this work we investigated how additional knowledge, for example age and gender of the user, can be used to improve recognition of affective state. Two methods from automatic speech recognition are used to incorporate age and gender differences in recognition of affective state: speaker group-dependent (SGD) modelling and vocal tract length normalisation (VTLN). The investigations were performed on four corpora with acted and natural affected speech. Different features and two methods of classification (Gaussian mixture models (GMMs) and multi-layer perceptrons (MLPs)) were used. In addition, the effects of channel compensation and contextual characteristics were analysed. The results are compared with our own baseline results and with results reported in the literature. Two hypotheses were tested. First, incorporation of age information further improves speaker group-dependent modelling. Second, acoustic normalization does not achieve the same improvement as achieved by speaker group-dependent modelling, because the age and gender of a speaker affects the way emotions are expressed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic speech recognition: a survey

Article 10 November 2020

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

Speech Emotion Recognition: A Comprehensive Survey

Article 08 March 2023

References

Albornoz EM, Milone DH, Rufiner HL. Spoken emotion recognition using hierarchical classifiers. Comput Speech Lang. 2011;25(3):556–70.
Article Google Scholar
Atal B. Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. J Acoust Soc Am. 1974;55(6):1304–12.
Article CAS PubMed Google Scholar
Bahari M, Van Hamme H. Speaker age estimation using hidden markov model weight supervectors. In: Proceedings of the 11th ISSPA; 2012. p. 517–521.
Batliner A, Fischer K, Huber R, Spiker J, North E. Desperately seeking emotions: actors, wizards and human beings. In: Proceedings of the ISCA workshop on speech and emotion; 2000. p. 195–200.
Batliner A, Fischer K, Huber R, Spilker J, Nöth E. How to find trouble in communication. Speech Commun. 2003;40(1–2):117–43.
Article Google Scholar
Batliner A, Hacker C, Steidl S, Nöth E, Russell M, Wong M. “You stupid tin box”- children interacting with the AIBO robot: a cross-linguistic emotional speech corpus. In: Proceedings of LREC; 2004. p. 865–868.
Batliner A, Steidl S, Schuller B, Seppi D, Vogt T, Wagner J, Devillers L, Vidrascu L, Aharonson V, Kessous L, Amir N. Whodunnit—searching for the most important feature types signalling emotion-related user states in speech. Comput Speech Lang. 2011;25(1):4–28.
Article Google Scholar
Becker-Asano C. WASABI : Affect simulation for agents with believable interactivity. Ph.D. thesis, Universität Bielefeld; 2008.
Böck R, Hübner D, Wendemuth A. Determining optimal signal features and parameters for hmm-based emotion classification. In: Proceedings of the 15th IEEE mediterranean electrotechnical conference; 2010. p. 1586–1590.
Böck R, Limbrecht K, Walter S, Hrabal D, Traue HC, Glüge S, Wendemuth A. Intraindividual and interindividual multimodal emotion analyses in human–machine interaction. In: IEEE interantional multi-disciplinary conference on cognitive methods in situation awareness and decision support; 2012. p. 59–64.
Bocklet T, Maier A, Bauer J, Burkhardt F, Noth E. Age and gender recognition for telephone applications based on GMM supervectors and support vector machines. In: Proceedings of IEEE ICASSP’08; 2008. p. 1605–1608.
Burkhardt F, Eckert M, Johannsen W, Stegmann J. A database of age and gender annotated telephone speech. In: Proceedings of the 7th LREC. ELRA; 2010.
Burkhardt F, Paeschke A, Rolfes M, Sendlmeier W, Weiss B. A database of german emotional speech. In: Proceedings of interspeech; 2005. p. 1516–1520.
Busso C, Deng Z, Yildirim S, Bulut M, Lee C, Kazemzadeh A, Lee S, Neumann U, Narayanan S. Analysis of emotion recognition using facial expressions, speech and multimodal information. In: Proceedings of the 6th ICMI. New York, USA: ACM; 2004. p. 205–211.
Butler LD, Nolen-Hoeksema S. Gender differences in responses to depressed mood in a college sample. Sex Roles. 1994;30(5–6):331–46.
Article Google Scholar
Cohen J, Kamm T, Andreou AG. Vocal tract normalization in speech recognition: compensating for systematic speaker variability. J Acoust Soc Am. 1995;97(5):3246–7.
Article Google Scholar
Cowie R, Cornelius RR. Describing the emotional states that are expressed in speech. Speech Commun. 2003;40(1–2):5–32.
Article Google Scholar
Cullen A, Harte N. Feature sets for automatic classification of dimensional affect. In: IET Irish signals and systems conference (ISSC 2012); 2012. p. 1–6.
Davis S, Mermelstein P. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust Speech Signal Process. 1980;28(4):357–66.
Article Google Scholar
Dellwo V, Leemann A, Kolly MJ. Speaker idiosyncratic rhythmic features in the speech signal. In: Proceedings of Interspeech; 2012. Portland, Oregon.
de Veth J, Boves L. On the efficiency of classical rasta filtering for continuous speech recognition: keeping the balance between acoustic pre-processing and acoustic modelling. Speech Commun. 2003;39(3–4):269–86.
Article Google Scholar
Dmitrieva E, Gelman V. The relationship between the perception of emotional intonation of speech in conditions of interference and the acoustic parameters of speech signals in adults of different gender and age. Neurosci Behav Physiol. 2012;42:920–8.
Article Google Scholar
Dobrišek S, Gajšek R, Mihelič F, Pavešić N, Štruc V. Towards efficient multi-modal emotion recognition. Int J Adv Robot Syst. 2013;10:53. doi:10.5772/54002.
Douglas-Cowie E, Cowie R, Sneddon I, Cox C, Lowry O, McRorie M, Martin JC, Devillers L, Abrilian S, Batliner A, Amir N, Karpouzis K. The HUMAINE database: addressing the collection and annotation of naturalistic and induced emotional data. In: Proceedings of ACII’07; 2007. p. 488–500.
Douglas-Cowie E, Devillers L, Martin JC, Cowie R, Savvidou S, Abrilian S, Cox C. Multimodal databases of everyday emotion: facing up to complexity. In: European conference on speech communication and technology; 2005. p. 813–816.
Dumouchel P, Dehak N, Attabi Y, Dehak R, Boufaden N. Cepstral and long-term features for emotion recognition. In: Proceedings of Interspeech;2009. p. 344–347.
Ekman P. Handbook of cognition and emotion, chap. basic emotions. Sussex, UK: Wiley; 2005. p. 45–60.
Emori T, Shinoda K. Rapid vocal tract length normalization using maximum likelihood estimation. In: Proceedings of EUROSPEECH 2001, 7th European conference on speech communication and technology. Denmark: Aalborg; 2001. p. 1649–1652.
Engberg IS, Hansen AV. Documentation of the danish emotional speech database (DES). Technical report. Denmark: Center for Person, Kommunikation, Aalborg University; 1996. Internal aau report.
Frommer J, Michaelis B, Rösner D, Wendemuth A, Friesen R, Haase, M, Kunze M, Andrich R, Lange J, Panning A, Siegert I. Towards emotion and affect detection in the multimodal last minute corpus. In: Proceedings of the 8th LREC; 2012.
Frommer J, Rösner D, Haase M, Lange J, Friesen R, Otto M. Detection and avoidance of failures in dialogues—wizard of oz experiment operator’s manual. Pabst Science Publishers; 2012.
Gajsek R, Zibert J, Justin T, Struc V, Vesnicer B, Mihelic F. Gender and affect recognition based on gmm and gmm-ubm modeling with relevance map estimation. In: Proceedings of interspeech; 2010. p. 2810–2813.
Giuliani D, Gerosa M. Investigating recognition of children’s speech. In: Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP ’03), vol. 2; 2003. p. II-137-40.
Glüge S, Böck R, Wendemuth A. Segmented-memory recurrent neural networks versus hidden markov models in emotion recognition from speech. In: Proceedings of the 3rd international joint conference on computational intelligence. Paris, France; 2011. p. 308–315.
Gnjatović M, Rösner D. On the role of the NIMITEK corpus in developing an emotion adaptive spoken dialogue system. In: Proceedings of the 7th LREC. Marrakech, Morocco; 2008.
Grimm M, Kroschel K, Narayanan S. The vera am mittag german audio–visual emotional speech database. In: Proceedings of ICME; 2008. p. 865–868.
Gross J, Carstensen L, Pasupathi M, Tsai J, Skorpen C, Hsu A. Emotion and aging: experience, expression, and control. Psychol Aging. 1997;12(4):590–9.
Article CAS PubMed Google Scholar
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH. The weka data mining software: an update. SIGKDD Explor Newsl. 2009;11(1):10–8.
Article Google Scholar
Hartmann K, Siegert I, Philippou-Hübner D, Wendemuth A. Emotion detection in HCI: from speech features to emotion space. In: Proceedings of the 12th IFAC, IFIP, IFORS, IEA symposium on analysis, design, and evaluation of human–machine systems. Las Vegas, USA; 2013.
Hassan A, Damper RI, Niranjan M. On acoustic emotion recognition: compensating for covariate shift. IEEE Trans Audio Speech Lang Process. 2013;21(7):1458–1468.
Hermansky H, Morgan N. Rasta processing of speech. IEEE Trans Speech Audio Process. 1994;2(4):578–89.
Article Google Scholar
Ho CH. Speaker modelling for voice conversion. Ph.D. thesis, Department of Electronic and Computer Engineering, Brunel University, London; 2001.
Hubeika V. Estimation of gender and age from recorded speech. In: Proceedings of the ACM student research competition. Czech Technical University; 2006. p. 25–32.
Kelly F, Harte N. Effects of long-term ageing on speaker verification. In: Proceedings of the COST 2101 European conference on Biometrics and ID management. Berlin: Springer; 2011. p. 113–124.
Kinnunen T, Li H. An overview of text-independent speaker recognition: from features to supervectors. Speech Commun. 2010;52(1):12–40.
Article Google Scholar
Kockmann M, Burget L, Černocký JH. Application of speaker- and language identification state-of-the-art techniques for emotion recognition. Speech Commun. 2011;53(9–10):1172–85.
Article Google Scholar
Lee L, Rose R. Speaker normalization using efficient frequency warping procedures. IEEE Int Conf Acoust Speech Signal Process. 1996;1:353–6.
Google Scholar
Lee L, Rose R. A frequency warping approach to speaker normalization. IEEE Trans Speech Audio Process. 1998;6(1):49–60.
Article CAS Google Scholar
Lee MW, Kwak KC. Performance comparison of gender and age group recognition for human-robot interaction. IJACSA. 2012;3:207–11.
Google Scholar
Lee S, Potamianos A, Narayanan S. Analysis of children’s speech: duration, pitch and formants. In: Proceedings of interspeech, vol 1; 1997. p. 473–476.
Li M, Han K, Narayanan S. Automatic speaker age and gender recognition using acoustic and prosodic level information fusion. Comput Speech Lang. 2012;27:151–67.
Article CAS Google Scholar
Li M, Jung CS, Han KJ. Combining five acoustic level modeling methods for automatic speaker age and gender recognition. In: Proceedings of interspeech; 2010. p. 2826–2829.
Lipovčan LK, Prizmić Z, Franc R. Age and gender differences in affect regulation strategies. Drustvena istrazivanja: J Gen Social Issues. 2009;18(6):1075–88.
Google Scholar
Martin O, Kotsia I, Macq B, Pitas I. The enterface’05 audio–visual emotion database. In: Proceedings of the 22nd international conference on data engineering workshops. Washington, DC, USA: IEEE Computer Society; 2006.
Massaro D, Egan P. Perceiving affect from the voice and the face. Psychon Bull Rev. 1996;3(2):215–21.
Article CAS PubMed Google Scholar
McDougall, W.: An introduction to social psychology. Adamant Media Corporation, Chestnut Hill, USA, Facsimile reprint of a 1912 edition by John W. Boston: Luce & Co.; 2001.
McKeown G, Valstar M, Cowie R, Pantic M. The semaine corpus of emotionally coloured character interactions. In: Proceedings of ICME; 2010. p. 1079–1084.
McRae K, Ochsner KN, Mauss IB, Gabrieli JJD, Gross JJ. Gender differences in emotion regulation: an fMRI study of cognitive reappraisal. Group Process Intergroup Relat. 2008;11(2):143–62.
Article Google Scholar
Meinedo H, Trancoso I. Age and gender detection in the i-dash project. ACM Trans Speech Lang Process. 2011;7(4):13:1–16.
Article Google Scholar
Mengistu KT. Robust acoustic and semantic modeling in a telephone-based spoken dialog system. Ph.D. thesis, Otto von Guericke University Magdeburg; 2009.
Morris JD. SAM: the self-assessment manikin an efficient cross-cultural measurement of emotional response. J Advert Res. 1995;35(6):63–8.
Google Scholar
Mower E, Metallinou A, Lee C, Kazemzadeh A, Busso C, Lee S, Narayanan S. Interpreting ambiguous emotional expressions. In: 3rd Internatinal conference on affective computing and intelligent interaction and workshops (ACII); 2009.
Neiberg D, Elenius K, Laskowski K. A database of german emotional speech. In: Proceedings of interspeech; 2006. p. 809–812.
Olson DL, Delen D. Advanced data mining techniques. Berlin: Springer; 2008.
Google Scholar
Paleari M, Huet B, Chellali R. Towards multimodal emotion recognition: a new approach. In: Proceedings of the ACM international conference on image and video retrieval. New York, NY, USA: ACM; 2010. p. 174–181.
Palm G, Glodek M. Towards emotion recognition in human computer interaction. In: Neural nets and surroundings, smart innovation, systems and technologies, vol. 19. Berlin: Springer; 2013. p. 323–36.
Panning A, Siegert I, Al-Hamadi A, Wendemuth A, Rösner D, Frommer J, Krell G, Michaelis B. Multimodal affect recognition in spontaneous HCI environment. In: IEEE international conference on signal processing, communications and computing; 2012. p. 430–435.
Plutchik R. Emotion, a psychoevolutionary synthesis. New York: Harper & Row; 1980.
Google Scholar
Potamianos A, Narayanan S. A review of the acoustic and linguistic properties of children’s speech. In: IEEE 9th workshop on multimedia signal processing (MMSP 2007); 2007. p. 22–25.
Rabiner L, Cheng MJ, Rosenberg AE, McGonegal CA. A comparative performance study of several pitch detection algorithms. IEEE Trans ASSP. 1976;24:399–417.
Article Google Scholar
Rao K, Koolagudi S, Vempada R. Emotion recognition from speech using global and local prosodic features. IJST. 2013;16(2):143–60.
Google Scholar
Rosenberg A. Classifying skewed data: importance weighting to optimize average recall. In: Proceedings of Interspeech; 2012.
Rösner D, Friesen R, Otto M, Lange J, Haase M, Frommer J. Intentionality in interacting with companion systems—an empirical approach. In: Human–computer interaction, towards mobile and intelligent interaction environments, LNCS, vol. 6763. Berlin, Heidelberg: Springer; 2011. p. 593–602.
Rösner D, Frommer J, Andrich R, Friesen R, Haase M, Kunze M, Lange J, Otto M. LAST MINUTE: a novel corpus to support emotion, sentiment and social signal processing. In: 4th International workshop on corpora for research on emotion sentiment and social signals—ES3. ELRA; 2012. p. 82–89.
Rösner D, Frommer J, Friesen R, Haase M, Lange J, Otto M. LAST MINUTE: a multimodal corpus of speech-based user-companion interactions. In: Proceedings of the 8th LREC; 2012. p. 96–103.
Ruvolo P, Fasel I, Movellan JR. A learning approach to hierarchical feature selection and aggregation for audio classification. Pattern Recogn Lett. 2010;31(12):1535–42.
Article Google Scholar
Scherer K. Appraisal considered as a process of multilevel sequential checking. In: Scherer KR, Schorr A, Johnstone T, editors. Appraisal processes in emotion: theory, methods, research. Oxford: Oxford University Press; 2001. p. 92–120.
Google Scholar
Scherer K, Dan E, Flykt A. What determines a feeling’s position in affective space? A case for appraisal. Cogn Emot. 2006;20:92–113.
Article Google Scholar
Schiel F. Automatic phonetic transcription of non-prompted speech. In: Proceedings of the XIVth international congress of phonetic sciences, ICPhS99. San Francisco; 1999. p. 607–610.
Schuller B, Batliner A, Steidl S, Seppi D. Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge. Speech Commun. 2011;53(9–10):1062–87.
Article Google Scholar
Schuller B, Müller R, Eyben F, Gast J, Hörnler B, Wöllmer M, Rigoll G, Höthker A, Konosu H. Being bored? Recognising natural interest by extensive audiovisual integration for real-life application. Image Vis Comput. 2009;27(12):1760–74.
Article Google Scholar
Schuller B, Seppi D, Batliner A, Maier A, Steidl S. Towards more reality in the recognition of emotional speech. In: IEEE international conference on acoustics, speech and signal processing, vol. 4; 2007. p. IV-941–IV-944.
Schuller B, Steidl S, Batliner A. The interspeech 2009 emotion challenge. In: Proceedings of INTERSPEECH’2009. Brighton, UK: ISCA; 2009. p. 312–315.
Schuller B, Vlasenko B, Eyben F, Rigoll G, Wendemuth A. Acoustic emotion recognition: a benchmark comparison of performances. In: Proceedings of the IEEE automatic speech recognition and understanding workshop, ASRU 2009. Merano, Italy; 2009 . p. 552–557.
Schuller B, Vlasenko B, Eyben F, Wollmer M, Stuhlsatz A, Wendemuth A, Rigoll G. Cross-corpus acoustic emotion recognition: variances and strategies. IEEE Trans Affect Comput. 2010;I:119–31.
Article Google Scholar
Schuller B, Wöllmer M, Eyben F, Rigoll G. Spectral or voice quality? Feature type relevance for the discrimination of emotion pairs. In: Hancil S, editor. The role of prosody in affective speech, linguistic insights, studies in language and communication, vol. 97. Frankfurt am Main: Peter Lang Publishing Group; 2009. p. 285–307.
Google Scholar
Schwenker F, Scherer S, Schmidt M, Schels M, Glodek M. Multiple classifier systems for the recogonition of human emotions. In: Gayar N, Kittler J, Roli F, editors. Multiple classifier systems, LNCS, vol. 5997. Berlin: Springer; 2010. p. 315–24.
Chapter Google Scholar
Shahin I. Gender-dependent emotion recognition based on HMMs and SPHMMs. Int J Speech Technol. 2013;16(2):133–41.
Article Google Scholar
Siegert I, Böck R, Philippou-Hübner D, Wendemuth A. Investigation of hierarchical classification for simultaneous gender and age recognitions. In: Proceedings of the 23. ESSV; 2012.
Siegert I, Böck R, Wendemuth A. Inter-rater reliability for emotion annotation in human–computer interaction—comparison and methodological improvements. J Multimodal User Interfaces. 2013;8(1):17–28.
Siegert I, Böck R, Philippou-Hübner D, Vlasenko B, Wendemuth A. Appropriate emotional labeling of non-acted speech using basic emotions, Geneva emotion wheel and self assessment manikins. In: Proceedings of ICME; 2011.
Siegert I, Böck R, Wendemuth A. The influence of context knowledge for multimodal annotation on natural material. In: Joint Proceedings of the IVA 2012 workshops. Otto von Guericke University Magdeburg; 2012. p. 25–32.
Siegert I, Glodek M, Panning A, Krell G, Schwenker F, Al-Hamadi A, Wendemuth A. Using speaker group-dependent modelling to improve fusion of fragmentary classifier decisions. In: IEEE international conference on cybernetics (CYBCONF); 2013. p. 132–137.
Siegert I, Hartmann K, Böck R, Wendemuth A. Speaker group-dependent modelling for affect recognition from speech. In: ERM4HCI 2013: The 1st workshop on emotion representation and modelling in human–computer-interaction-systems; 2013.
Steidl S, Batliner A, Nöth E, Hornegger J. Quantification of segmentation and f0 errors and their effect on emotion recognition. In: Sojka P, Horák A, Kopeček I, Pala K, editors. Text, speech and dialogue, vol. 5246., Lecture Notes in Computer ScienceBerlin: Springer; 2008. p. 525–34.
Chapter Google Scholar
Suzuki M, Tsuchiya S, Ren F. A novel emotion recognizer from speech using both prosodic and linguistic features. In: König A, Dengel A, Hinkelmann K, Kise K, Howlett R, Jain L, editors. Knowledge-based and intelligent information and engineering systems, vol. 6881., Lecture Notes in Computer ScienceBerlin: Springer; 2011. p. 456–65.
Chapter Google Scholar
Takahashi K. Remarks on emotion recognition from biopotential signals. In: 2nd International conference on autonomous robots and agents; 2004. p. 186–191.
Tan L, Karnjanadecha M. Pitch detection algorithm: autocorrelation method and AMDF. In: Proceedings of the 3rd international symposium on communications and information technology; 2003. pp. 551–556.
Truong KP, van Leeuwen DA, de Jong FM. speech-based recognition of self-reported and observed emotion in a dimensional space. Speech Commun. 2012;54(9):1049–63.
Article Google Scholar
Truong KP, Neerincx MA, van Leeuwen DA. Assessing agreement of observer- and self-annotations in spontaneous multimodal emotion data. In: Proceedings of interspeech; 2008. p. 318–321.
Vaughan B, Kosidis S, Cullen C, Wang Y. Task-based mood induction procedures for the elicitation of natural emotional responses. In: The 4th international conference on cybernetics and information technologies, systems and applications. Orlando, Florida; 2007.
Vergin R, Farhat A, O’Shaughnessy D. Robust gender-dependent acoustic–phonetic modelling in continuous speech recognition based on a new automatic male/female classification. In: 4th International conference on spoken language processing; 1996. p. 1081–1084.
Viterbi A. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans Inf Theory. 1967;13(2):260–9.
Article Google Scholar
Vogt T, André E. Improving automatic emotion recognition from speech via gender differentiation. In: Proceedings of the 5th LREC; 2006.
Walter S, Scherer S, Schels M, Glodek M, Hrabal D, Schmidt M, Böck R, Limbrecht K, Traue H, Schwenker F. Multimodal emotion classification in naturalistic user behavior. In: Human–computer interaction, towards mobile and intelligent interaction environments, LNCS, vol. 6763. Berlin, Heidelberg: Springer; 2011. p. 603–11.
Wegmann S, McAllaster D, Orloff J, Peskin B. Speaker normalization on conversational telephone speech. In: Proceedings of IEEE ICASSP’96, vol. 1; 1996. p. 339–341.
Wendemuth A, Biundo S. A companion technology for cognitive technical systems. In: Cognitive behavioural systems, LNCS, vol. 7403. Berlin, Heidelberg: Springer; 2012. p. 89–103.
Wong E, Sridharan S. Utilise vocal tract length normalisation for robust automatic language identification. In: Proceedings of the 9th Australian international conference on speech science and technology. Melbourne, Victoria, Australia; 2002.
Wundt W. Vorlesungen über die Menschen- und Tierseele. 4th ed. Leipzig: L. Voss; 1906.
Google Scholar
Young S, Evermann G, Gales M, Hain T, Kershaw D, Liu X, Moore G, Odell J, Ollason D, Povey D, Valtchev V, Woodland P. The HTK book (for HTK Version 3.4). Cambridge: Cambridge University Press; 2006.
Zeng Z, Pantic M, Roisman GI, Huang TS. A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans Pattern Anal Mach Intell. 2009;31:39–58.
Article PubMed Google Scholar
Zeng Z, Tu J, Pianfetti BM, Huang TS. Audio–visual affective expression recognition through multistream fused HMM. Trans Multimed. 2008;10(4):570–7.
Article Google Scholar
Zhan P, Waibel A. Vocal tract length normalization for large vocabulary continuous speech recognition. Technical report, CMU-CS-97-148. Carnegie Mellon University; 1997.
Zhang S, Li L, Zhao Z. Audio–visual emotion recognition based on facial expression and affective speech. In: Wang F, Lei J, Lau R, Zhang J, editors. Multimedia and signal processing, communications in computer and information science, vol. 346. Berlin: Springer; 2012. p. 46–52.
Google Scholar

Download references

Acknowledgments

The work presented in this article was conducted within the Transregional Collaborative Research Centre SFB/TRR 62 “Companion-Technology for Cognitive Technical Systems” funded by the German Research Foundation (DFG). We also acknowledge the DFG for financing our computing cluster. Portions of the research in this article use the LAST MINUTE Corpus generated under the supervision of Professor Jörg Frommer and Professor Dietmar Rösner.

Author information

Authors and Affiliations

IIKT and CBBS, Otto von Guericke University Magdeburg, 39016, Magdeburg, Germany
Ingo Siegert, David Philippou-Hübner, Kim Hartmann, Ronald Böck & Andreas Wendemuth

Authors

Ingo Siegert
View author publications
You can also search for this author in PubMed Google Scholar
David Philippou-Hübner
View author publications
You can also search for this author in PubMed Google Scholar
Kim Hartmann
View author publications
You can also search for this author in PubMed Google Scholar
Ronald Böck
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Wendemuth
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ingo Siegert.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Siegert, I., Philippou-Hübner, D., Hartmann, K. et al. Investigation of Speaker Group-Dependent Modelling for Recognition of Affective States from Speech. Cogn Comput 6, 892–913 (2014). https://doi.org/10.1007/s12559-014-9296-6

Download citation

Received: 14 December 2013
Accepted: 09 July 2014
Published: 23 August 2014
Issue Date: December 2014
DOI: https://doi.org/10.1007/s12559-014-9296-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Investigation of Speaker Group-Dependent Modelling for Recognition of Affective States from Speech

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

Speech Emotion Recognition: A Comprehensive Survey

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Investigation of Speaker Group-Dependent Modelling for Recognition of Affective States from Speech

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

Speech Emotion Recognition: A Comprehensive Survey

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation