Skip to main content
Log in

Investigation of Speaker Group-Dependent Modelling for Recognition of Affective States from Speech

  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

For successful human–machine-interaction (HCI) the pure textual information and the individual skills, preferences, and affective states of the user must be known. Therefore, as a starting point, the user’s actual affective state has to be recognized. In this work we investigated how additional knowledge, for example age and gender of the user, can be used to improve recognition of affective state. Two methods from automatic speech recognition are used to incorporate age and gender differences in recognition of affective state: speaker group-dependent (SGD) modelling and vocal tract length normalisation (VTLN). The investigations were performed on four corpora with acted and natural affected speech. Different features and two methods of classification (Gaussian mixture models (GMMs) and multi-layer perceptrons (MLPs)) were used. In addition, the effects of channel compensation and contextual characteristics were analysed. The results are compared with our own baseline results and with results reported in the literature. Two hypotheses were tested. First, incorporation of age information further improves speaker group-dependent modelling. Second, acoustic normalization does not achieve the same improvement as achieved by speaker group-dependent modelling, because the age and gender of a speaker affects the way emotions are expressed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  1. Albornoz EM, Milone DH, Rufiner HL. Spoken emotion recognition using hierarchical classifiers. Comput Speech Lang. 2011;25(3):556–70.

    Article  Google Scholar 

  2. Atal B. Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. J Acoust Soc Am. 1974;55(6):1304–12.

    Article  CAS  PubMed  Google Scholar 

  3. Bahari M, Van Hamme H. Speaker age estimation using hidden markov model weight supervectors. In: Proceedings of the 11th ISSPA; 2012. p. 517–521.

  4. Batliner A, Fischer K, Huber R, Spiker J, North E. Desperately seeking emotions: actors, wizards and human beings. In: Proceedings of the ISCA workshop on speech and emotion; 2000. p. 195–200.

  5. Batliner A, Fischer K, Huber R, Spilker J, Nöth E. How to find trouble in communication. Speech Commun. 2003;40(1–2):117–43.

    Article  Google Scholar 

  6. Batliner A, Hacker C, Steidl S, Nöth E, Russell M, Wong M. “You stupid tin box”- children interacting with the AIBO robot: a cross-linguistic emotional speech corpus. In: Proceedings of LREC; 2004. p. 865–868.

  7. Batliner A, Steidl S, Schuller B, Seppi D, Vogt T, Wagner J, Devillers L, Vidrascu L, Aharonson V, Kessous L, Amir N. Whodunnit—searching for the most important feature types signalling emotion-related user states in speech. Comput Speech Lang. 2011;25(1):4–28.

    Article  Google Scholar 

  8. Becker-Asano C. WASABI : Affect simulation for agents with believable interactivity. Ph.D. thesis, Universität Bielefeld; 2008.

  9. Böck R, Hübner D, Wendemuth A. Determining optimal signal features and parameters for hmm-based emotion classification. In: Proceedings of the 15th IEEE mediterranean electrotechnical conference; 2010. p. 1586–1590.

  10. Böck R, Limbrecht K, Walter S, Hrabal D, Traue HC, Glüge S, Wendemuth A. Intraindividual and interindividual multimodal emotion analyses in human–machine interaction. In: IEEE interantional multi-disciplinary conference on cognitive methods in situation awareness and decision support; 2012. p. 59–64.

  11. Bocklet T, Maier A, Bauer J, Burkhardt F, Noth E. Age and gender recognition for telephone applications based on GMM supervectors and support vector machines. In: Proceedings of IEEE ICASSP’08; 2008. p. 1605–1608.

  12. Burkhardt F, Eckert M, Johannsen W, Stegmann J. A database of age and gender annotated telephone speech. In: Proceedings of the 7th LREC. ELRA; 2010.

  13. Burkhardt F, Paeschke A, Rolfes M, Sendlmeier W, Weiss B. A database of german emotional speech. In: Proceedings of interspeech; 2005. p. 1516–1520.

  14. Busso C, Deng Z, Yildirim S, Bulut M, Lee C, Kazemzadeh A, Lee S, Neumann U, Narayanan S. Analysis of emotion recognition using facial expressions, speech and multimodal information. In: Proceedings of the 6th ICMI. New York, USA: ACM; 2004. p. 205–211.

  15. Butler LD, Nolen-Hoeksema S. Gender differences in responses to depressed mood in a college sample. Sex Roles. 1994;30(5–6):331–46.

    Article  Google Scholar 

  16. Cohen J, Kamm T, Andreou AG. Vocal tract normalization in speech recognition: compensating for systematic speaker variability. J Acoust Soc Am. 1995;97(5):3246–7.

    Article  Google Scholar 

  17. Cowie R, Cornelius RR. Describing the emotional states that are expressed in speech. Speech Commun. 2003;40(1–2):5–32.

    Article  Google Scholar 

  18. Cullen A, Harte N. Feature sets for automatic classification of dimensional affect. In: IET Irish signals and systems conference (ISSC 2012); 2012. p. 1–6.

  19. Davis S, Mermelstein P. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust Speech Signal Process. 1980;28(4):357–66.

    Article  Google Scholar 

  20. Dellwo V, Leemann A, Kolly MJ. Speaker idiosyncratic rhythmic features in the speech signal. In: Proceedings of Interspeech; 2012. Portland, Oregon.

  21. de Veth J, Boves L. On the efficiency of classical rasta filtering for continuous speech recognition: keeping the balance between acoustic pre-processing and acoustic modelling. Speech Commun. 2003;39(3–4):269–86.

    Article  Google Scholar 

  22. Dmitrieva E, Gelman V. The relationship between the perception of emotional intonation of speech in conditions of interference and the acoustic parameters of speech signals in adults of different gender and age. Neurosci Behav Physiol. 2012;42:920–8.

    Article  Google Scholar 

  23. Dobrišek S, Gajšek R, Mihelič F, Pavešić N, Štruc V. Towards efficient multi-modal emotion recognition. Int J Adv Robot Syst. 2013;10:53. doi:10.5772/54002.

  24. Douglas-Cowie E, Cowie R, Sneddon I, Cox C, Lowry O, McRorie M, Martin JC, Devillers L, Abrilian S, Batliner A, Amir N, Karpouzis K. The HUMAINE database: addressing the collection and annotation of naturalistic and induced emotional data. In: Proceedings of ACII’07; 2007. p. 488–500.

  25. Douglas-Cowie E, Devillers L, Martin JC, Cowie R, Savvidou S, Abrilian S, Cox C. Multimodal databases of everyday emotion: facing up to complexity. In: European conference on speech communication and technology; 2005. p. 813–816.

  26. Dumouchel P, Dehak N, Attabi Y, Dehak R, Boufaden N. Cepstral and long-term features for emotion recognition. In: Proceedings of Interspeech;2009. p. 344–347.

  27. Ekman P. Handbook of cognition and emotion, chap. basic emotions. Sussex, UK: Wiley; 2005. p. 45–60.

  28. Emori T, Shinoda K. Rapid vocal tract length normalization using maximum likelihood estimation. In: Proceedings of EUROSPEECH 2001, 7th European conference on speech communication and technology. Denmark: Aalborg; 2001. p. 1649–1652.

  29. Engberg IS, Hansen AV. Documentation of the danish emotional speech database (DES). Technical report. Denmark: Center for Person, Kommunikation, Aalborg University; 1996. Internal aau report.

  30. Frommer J, Michaelis B, Rösner D, Wendemuth A, Friesen R, Haase, M, Kunze M, Andrich R, Lange J, Panning A, Siegert I. Towards emotion and affect detection in the multimodal last minute corpus. In: Proceedings of the 8th LREC; 2012.

  31. Frommer J, Rösner D, Haase M, Lange J, Friesen R, Otto M. Detection and avoidance of failures in dialogues—wizard of oz experiment operator’s manual. Pabst Science Publishers; 2012.

  32. Gajsek R, Zibert J, Justin T, Struc V, Vesnicer B, Mihelic F. Gender and affect recognition based on gmm and gmm-ubm modeling with relevance map estimation. In: Proceedings of interspeech; 2010. p. 2810–2813.

  33. Giuliani D, Gerosa M. Investigating recognition of children’s speech. In: Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP ’03), vol. 2; 2003. p. II-137-40.

  34. Glüge S, Böck R, Wendemuth A. Segmented-memory recurrent neural networks versus hidden markov models in emotion recognition from speech. In: Proceedings of the 3rd international joint conference on computational intelligence. Paris, France; 2011. p. 308–315.

  35. Gnjatović M, Rösner D. On the role of the NIMITEK corpus in developing an emotion adaptive spoken dialogue system. In: Proceedings of the 7th LREC. Marrakech, Morocco; 2008.

  36. Grimm M, Kroschel K, Narayanan S. The vera am mittag german audio–visual emotional speech database. In: Proceedings of ICME; 2008. p. 865–868.

  37. Gross J, Carstensen L, Pasupathi M, Tsai J, Skorpen C, Hsu A. Emotion and aging: experience, expression, and control. Psychol Aging. 1997;12(4):590–9.

    Article  CAS  PubMed  Google Scholar 

  38. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH. The weka data mining software: an update. SIGKDD Explor Newsl. 2009;11(1):10–8.

    Article  Google Scholar 

  39. Hartmann K, Siegert I, Philippou-Hübner D, Wendemuth A. Emotion detection in HCI: from speech features to emotion space. In: Proceedings of the 12th IFAC, IFIP, IFORS, IEA symposium on analysis, design, and evaluation of human–machine systems. Las Vegas, USA; 2013.

  40. Hassan A, Damper RI, Niranjan M. On acoustic emotion recognition: compensating for covariate shift. IEEE Trans Audio Speech Lang Process. 2013;21(7):1458–1468.

  41. Hermansky H, Morgan N. Rasta processing of speech. IEEE Trans Speech Audio Process. 1994;2(4):578–89.

    Article  Google Scholar 

  42. Ho CH. Speaker modelling for voice conversion. Ph.D. thesis, Department of Electronic and Computer Engineering, Brunel University, London; 2001.

  43. Hubeika V. Estimation of gender and age from recorded speech. In: Proceedings of the ACM student research competition. Czech Technical University; 2006. p. 25–32.

  44. Kelly F, Harte N. Effects of long-term ageing on speaker verification. In: Proceedings of the COST 2101 European conference on Biometrics and ID management. Berlin: Springer; 2011. p. 113–124.

  45. Kinnunen T, Li H. An overview of text-independent speaker recognition: from features to supervectors. Speech Commun. 2010;52(1):12–40.

    Article  Google Scholar 

  46. Kockmann M, Burget L, Černocký JH. Application of speaker- and language identification state-of-the-art techniques for emotion recognition. Speech Commun. 2011;53(9–10):1172–85.

    Article  Google Scholar 

  47. Lee L, Rose R. Speaker normalization using efficient frequency warping procedures. IEEE Int Conf Acoust Speech Signal Process. 1996;1:353–6.

    Google Scholar 

  48. Lee L, Rose R. A frequency warping approach to speaker normalization. IEEE Trans Speech Audio Process. 1998;6(1):49–60.

    Article  CAS  Google Scholar 

  49. Lee MW, Kwak KC. Performance comparison of gender and age group recognition for human-robot interaction. IJACSA. 2012;3:207–11.

    Google Scholar 

  50. Lee S, Potamianos A, Narayanan S. Analysis of children’s speech: duration, pitch and formants. In: Proceedings of interspeech, vol 1; 1997. p. 473–476.

  51. Li M, Han K, Narayanan S. Automatic speaker age and gender recognition using acoustic and prosodic level information fusion. Comput Speech Lang. 2012;27:151–67.

    Article  CAS  Google Scholar 

  52. Li M, Jung CS, Han KJ. Combining five acoustic level modeling methods for automatic speaker age and gender recognition. In: Proceedings of interspeech; 2010. p. 2826–2829.

  53. Lipovčan LK, Prizmić Z, Franc R. Age and gender differences in affect regulation strategies. Drustvena istrazivanja: J Gen Social Issues. 2009;18(6):1075–88.

    Google Scholar 

  54. Martin O, Kotsia I, Macq B, Pitas I. The enterface’05 audio–visual emotion database. In: Proceedings of the 22nd international conference on data engineering workshops. Washington, DC, USA: IEEE Computer Society; 2006.

  55. Massaro D, Egan P. Perceiving affect from the voice and the face. Psychon Bull Rev. 1996;3(2):215–21.

    Article  CAS  PubMed  Google Scholar 

  56. McDougall, W.: An introduction to social psychology. Adamant Media Corporation, Chestnut Hill, USA, Facsimile reprint of a 1912 edition by John W. Boston: Luce & Co.; 2001.

  57. McKeown G, Valstar M, Cowie R, Pantic M. The semaine corpus of emotionally coloured character interactions. In: Proceedings of ICME; 2010. p. 1079–1084.

  58. McRae K, Ochsner KN, Mauss IB, Gabrieli JJD, Gross JJ. Gender differences in emotion regulation: an fMRI study of cognitive reappraisal. Group Process Intergroup Relat. 2008;11(2):143–62.

    Article  Google Scholar 

  59. Meinedo H, Trancoso I. Age and gender detection in the i-dash project. ACM Trans Speech Lang Process. 2011;7(4):13:1–16.

    Article  Google Scholar 

  60. Mengistu KT. Robust acoustic and semantic modeling in a telephone-based spoken dialog system. Ph.D. thesis, Otto von Guericke University Magdeburg; 2009.

  61. Morris JD. SAM: the self-assessment manikin an efficient cross-cultural measurement of emotional response. J Advert Res. 1995;35(6):63–8.

    Google Scholar 

  62. Mower E, Metallinou A, Lee C, Kazemzadeh A, Busso C, Lee S, Narayanan S. Interpreting ambiguous emotional expressions. In: 3rd Internatinal conference on affective computing and intelligent interaction and workshops (ACII); 2009.

  63. Neiberg D, Elenius K, Laskowski K. A database of german emotional speech. In: Proceedings of interspeech; 2006. p. 809–812.

  64. Olson DL, Delen D. Advanced data mining techniques. Berlin: Springer; 2008.

    Google Scholar 

  65. Paleari M, Huet B, Chellali R. Towards multimodal emotion recognition: a new approach. In: Proceedings of the ACM international conference on image and video retrieval. New York, NY, USA: ACM; 2010. p. 174–181.

  66. Palm G, Glodek M. Towards emotion recognition in human computer interaction. In: Neural nets and surroundings, smart innovation, systems and technologies, vol. 19. Berlin: Springer; 2013. p. 323–36.

  67. Panning A, Siegert I, Al-Hamadi A, Wendemuth A, Rösner D, Frommer J, Krell G, Michaelis B. Multimodal affect recognition in spontaneous HCI environment. In: IEEE international conference on signal processing, communications and computing; 2012. p. 430–435.

  68. Plutchik R. Emotion, a psychoevolutionary synthesis. New York: Harper & Row; 1980.

    Google Scholar 

  69. Potamianos A, Narayanan S. A review of the acoustic and linguistic properties of children’s speech. In: IEEE 9th workshop on multimedia signal processing (MMSP 2007); 2007. p. 22–25.

  70. Rabiner L, Cheng MJ, Rosenberg AE, McGonegal CA. A comparative performance study of several pitch detection algorithms. IEEE Trans ASSP. 1976;24:399–417.

    Article  Google Scholar 

  71. Rao K, Koolagudi S, Vempada R. Emotion recognition from speech using global and local prosodic features. IJST. 2013;16(2):143–60.

    Google Scholar 

  72. Rosenberg A. Classifying skewed data: importance weighting to optimize average recall. In: Proceedings of Interspeech; 2012.

  73. Rösner D, Friesen R, Otto M, Lange J, Haase M, Frommer J. Intentionality in interacting with companion systems—an empirical approach. In: Human–computer interaction, towards mobile and intelligent interaction environments, LNCS, vol. 6763. Berlin, Heidelberg: Springer; 2011. p. 593–602.

  74. Rösner D, Frommer J, Andrich R, Friesen R, Haase M, Kunze M, Lange J, Otto M. LAST MINUTE: a novel corpus to support emotion, sentiment and social signal processing. In: 4th International workshop on corpora for research on emotion sentiment and social signals—ES3. ELRA; 2012. p. 82–89.

  75. Rösner D, Frommer J, Friesen R, Haase M, Lange J, Otto M. LAST MINUTE: a multimodal corpus of speech-based user-companion interactions. In: Proceedings of the 8th LREC; 2012. p. 96–103.

  76. Ruvolo P, Fasel I, Movellan JR. A learning approach to hierarchical feature selection and aggregation for audio classification. Pattern Recogn Lett. 2010;31(12):1535–42.

    Article  Google Scholar 

  77. Scherer K. Appraisal considered as a process of multilevel sequential checking. In: Scherer KR, Schorr A, Johnstone T, editors. Appraisal processes in emotion: theory, methods, research. Oxford: Oxford University Press; 2001. p. 92–120.

    Google Scholar 

  78. Scherer K, Dan E, Flykt A. What determines a feeling’s position in affective space? A case for appraisal. Cogn Emot. 2006;20:92–113.

    Article  Google Scholar 

  79. Schiel F. Automatic phonetic transcription of non-prompted speech. In: Proceedings of the XIVth international congress of phonetic sciences, ICPhS99. San Francisco; 1999. p. 607–610.

  80. Schuller B, Batliner A, Steidl S, Seppi D. Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge. Speech Commun. 2011;53(9–10):1062–87.

    Article  Google Scholar 

  81. Schuller B, Müller R, Eyben F, Gast J, Hörnler B, Wöllmer M, Rigoll G, Höthker A, Konosu H. Being bored? Recognising natural interest by extensive audiovisual integration for real-life application. Image Vis Comput. 2009;27(12):1760–74.

    Article  Google Scholar 

  82. Schuller B, Seppi D, Batliner A, Maier A, Steidl S. Towards more reality in the recognition of emotional speech. In: IEEE international conference on acoustics, speech and signal processing, vol. 4; 2007. p. IV-941–IV-944.

  83. Schuller B, Steidl S, Batliner A. The interspeech 2009 emotion challenge. In: Proceedings of INTERSPEECH’2009. Brighton, UK: ISCA; 2009. p. 312–315.

  84. Schuller B, Vlasenko B, Eyben F, Rigoll G, Wendemuth A. Acoustic emotion recognition: a benchmark comparison of performances. In: Proceedings of the IEEE automatic speech recognition and understanding workshop, ASRU 2009. Merano, Italy; 2009 . p. 552–557.

  85. Schuller B, Vlasenko B, Eyben F, Wollmer M, Stuhlsatz A, Wendemuth A, Rigoll G. Cross-corpus acoustic emotion recognition: variances and strategies. IEEE Trans Affect Comput. 2010;I:119–31.

    Article  Google Scholar 

  86. Schuller B, Wöllmer M, Eyben F, Rigoll G. Spectral or voice quality? Feature type relevance for the discrimination of emotion pairs. In: Hancil S, editor. The role of prosody in affective speech, linguistic insights, studies in language and communication, vol. 97. Frankfurt am Main: Peter Lang Publishing Group; 2009. p. 285–307.

    Google Scholar 

  87. Schwenker F, Scherer S, Schmidt M, Schels M, Glodek M. Multiple classifier systems for the recogonition of human emotions. In: Gayar N, Kittler J, Roli F, editors. Multiple classifier systems, LNCS, vol. 5997. Berlin: Springer; 2010. p. 315–24.

    Chapter  Google Scholar 

  88. Shahin I. Gender-dependent emotion recognition based on HMMs and SPHMMs. Int J Speech Technol. 2013;16(2):133–41.

    Article  Google Scholar 

  89. Siegert I, Böck R, Philippou-Hübner D, Wendemuth A. Investigation of hierarchical classification for simultaneous gender and age recognitions. In: Proceedings of the 23. ESSV; 2012.

  90. Siegert I, Böck R, Wendemuth A. Inter-rater reliability for emotion annotation in human–computer interaction—comparison and methodological improvements. J Multimodal User Interfaces. 2013;8(1):17–28.

  91. Siegert I, Böck R, Philippou-Hübner D, Vlasenko B, Wendemuth A. Appropriate emotional labeling of non-acted speech using basic emotions, Geneva emotion wheel and self assessment manikins. In: Proceedings of ICME; 2011.

  92. Siegert I, Böck R, Wendemuth A. The influence of context knowledge for multimodal annotation on natural material. In: Joint Proceedings of the IVA 2012 workshops. Otto von Guericke University Magdeburg; 2012. p. 25–32.

  93. Siegert I, Glodek M, Panning A, Krell G, Schwenker F, Al-Hamadi A, Wendemuth A. Using speaker group-dependent modelling to improve fusion of fragmentary classifier decisions. In: IEEE international conference on cybernetics (CYBCONF); 2013. p. 132–137.

  94. Siegert I, Hartmann K, Böck R, Wendemuth A. Speaker group-dependent modelling for affect recognition from speech. In: ERM4HCI 2013: The 1st workshop on emotion representation and modelling in human–computer-interaction-systems; 2013.

  95. Steidl S, Batliner A, Nöth E, Hornegger J. Quantification of segmentation and f0 errors and their effect on emotion recognition. In: Sojka P, Horák A, Kopeček I, Pala K, editors. Text, speech and dialogue, vol. 5246., Lecture Notes in Computer ScienceBerlin: Springer; 2008. p. 525–34.

    Chapter  Google Scholar 

  96. Suzuki M, Tsuchiya S, Ren F. A novel emotion recognizer from speech using both prosodic and linguistic features. In: König A, Dengel A, Hinkelmann K, Kise K, Howlett R, Jain L, editors. Knowledge-based and intelligent information and engineering systems, vol. 6881., Lecture Notes in Computer ScienceBerlin: Springer; 2011. p. 456–65.

    Chapter  Google Scholar 

  97. Takahashi K. Remarks on emotion recognition from biopotential signals. In: 2nd International conference on autonomous robots and agents; 2004. p. 186–191.

  98. Tan L, Karnjanadecha M. Pitch detection algorithm: autocorrelation method and AMDF. In: Proceedings of the 3rd international symposium on communications and information technology; 2003. pp. 551–556.

  99. Truong KP, van Leeuwen DA, de Jong FM. speech-based recognition of self-reported and observed emotion in a dimensional space. Speech Commun. 2012;54(9):1049–63.

    Article  Google Scholar 

  100. Truong KP, Neerincx MA, van Leeuwen DA. Assessing agreement of observer- and self-annotations in spontaneous multimodal emotion data. In: Proceedings of interspeech; 2008. p. 318–321.

  101. Vaughan B, Kosidis S, Cullen C, Wang Y. Task-based mood induction procedures for the elicitation of natural emotional responses. In: The 4th international conference on cybernetics and information technologies, systems and applications. Orlando, Florida; 2007.

  102. Vergin R, Farhat A, O’Shaughnessy D. Robust gender-dependent acoustic–phonetic modelling in continuous speech recognition based on a new automatic male/female classification. In: 4th International conference on spoken language processing; 1996. p. 1081–1084.

  103. Viterbi A. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans Inf Theory. 1967;13(2):260–9.

    Article  Google Scholar 

  104. Vogt T, André E. Improving automatic emotion recognition from speech via gender differentiation. In: Proceedings of the 5th LREC; 2006.

  105. Walter S, Scherer S, Schels M, Glodek M, Hrabal D, Schmidt M, Böck R, Limbrecht K, Traue H, Schwenker F. Multimodal emotion classification in naturalistic user behavior. In: Human–computer interaction, towards mobile and intelligent interaction environments, LNCS, vol. 6763. Berlin, Heidelberg: Springer; 2011. p. 603–11.

  106. Wegmann S, McAllaster D, Orloff J, Peskin B. Speaker normalization on conversational telephone speech. In: Proceedings of IEEE ICASSP’96, vol. 1; 1996. p. 339–341.

  107. Wendemuth A, Biundo S. A companion technology for cognitive technical systems. In: Cognitive behavioural systems, LNCS, vol. 7403. Berlin, Heidelberg: Springer; 2012. p. 89–103.

  108. Wong E, Sridharan S. Utilise vocal tract length normalisation for robust automatic language identification. In: Proceedings of the 9th Australian international conference on speech science and technology. Melbourne, Victoria, Australia; 2002.

  109. Wundt W. Vorlesungen über die Menschen- und Tierseele. 4th ed. Leipzig: L. Voss; 1906.

    Google Scholar 

  110. Young S, Evermann G, Gales M, Hain T, Kershaw D, Liu X, Moore G, Odell J, Ollason D, Povey D, Valtchev V, Woodland P. The HTK book (for HTK Version 3.4). Cambridge: Cambridge University Press; 2006.

  111. Zeng Z, Pantic M, Roisman GI, Huang TS. A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans Pattern Anal Mach Intell. 2009;31:39–58.

    Article  PubMed  Google Scholar 

  112. Zeng Z, Tu J, Pianfetti BM, Huang TS. Audio–visual affective expression recognition through multistream fused HMM. Trans Multimed. 2008;10(4):570–7.

    Article  Google Scholar 

  113. Zhan P, Waibel A. Vocal tract length normalization for large vocabulary continuous speech recognition. Technical report, CMU-CS-97-148. Carnegie Mellon University; 1997.

  114. Zhang S, Li L, Zhao Z. Audio–visual emotion recognition based on facial expression and affective speech. In: Wang F, Lei J, Lau R, Zhang J, editors. Multimedia and signal processing, communications in computer and information science, vol. 346. Berlin: Springer; 2012. p. 46–52.

    Google Scholar 

Download references

Acknowledgments

The work presented in this article was conducted within the Transregional Collaborative Research Centre SFB/TRR 62 “Companion-Technology for Cognitive Technical Systems” funded by the German Research Foundation (DFG). We also acknowledge the DFG for financing our computing cluster. Portions of the research in this article use the LAST MINUTE Corpus generated under the supervision of Professor Jörg Frommer and Professor Dietmar Rösner.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ingo Siegert.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Siegert, I., Philippou-Hübner, D., Hartmann, K. et al. Investigation of Speaker Group-Dependent Modelling for Recognition of Affective States from Speech. Cogn Comput 6, 892–913 (2014). https://doi.org/10.1007/s12559-014-9296-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12559-014-9296-6

Keywords

Navigation