Combining frame and turn-level information for robust recognition of emotions within speech

Vlasenko, Bogdan; Schuller, Björn; Wendemuth, Andreas; Rigoll, Gerhard

doi:10.21437/Interspeech.2007-611

Combining frame and turn-level information for robust recognition of emotions within speech

Bogdan Vlasenko, Björn Schuller, Andreas Wendemuth, Gerhard Rigoll

Current approaches to the recognition of emotion within speech usually use statistic feature information obtained by application of functionals on turn- or chunk levels. Yet, it is well known that thereby important information on temporal sub-layers as the frame-level is lost. We therefore investigate the benefits of integration of such information within turn-level feature space. For frame-level analysis we use GMM for classification and 39 MFCC and energy features with CMS. In a subsequent step output scores are fed forward into a 1.4k large-feature-space turn-level SVM emotion recognition engine. Thereby we use a variety of Low-Level-Descriptors and functionals to cover prosodic, speech quality, and articulatory aspects. Extensive test-runs are carried out on the public databases EMO-DB and SUSAS. Speaker-independent analysis is faced by speaker normalization. Overall results highly emphasize the benefits of feature integration on diverse time scales.

doi: 10.21437/Interspeech.2007-611

Cite as: Vlasenko, B., Schuller, B., Wendemuth, A., Rigoll, G. (2007) Combining frame and turn-level information for robust recognition of emotions within speech. Proc. Interspeech 2007, 2249-2252, doi: 10.21437/Interspeech.2007-611

@inproceedings{vlasenko07_interspeech,
  author={Bogdan Vlasenko and Björn Schuller and Andreas Wendemuth and Gerhard Rigoll},
  title={{Combining frame and turn-level information for robust recognition of emotions within speech}},
  year=2007,
  booktitle={Proc. Interspeech 2007},
  pages={2249--2252},
  doi={10.21437/Interspeech.2007-611}
}